R for genetics/genomics?R anyhow to analyse non genetic dataRRadegenet, poppr and pegasR Markdown to produce a report of the analyses, incorporating the workflow and resultsThis document is an R Markdown document. An R Markdown document is the combination of a Markdown document with chunks of R code in the middle that are evaluated using an R package called knitr.
R Markdown document can be used interactively or they can be turned into beautiful HTML document or other (e.g. PDF, Word, …) if you click on the button “Knit” on top.
The benefits of using an R Markdown document is that you have your text, scripts, results, plots and tables in a single place.
You can edit the current R Markdown file to add your own notes to it (but save it under a different name, so that you can go back to the original version if you need to) and to try out or even modify the R code.
If you prefer to see pure R code only, just do:
# file.remove("R4G_course.R") # uncomment and run to remove existing file
knitr::purl("R4G_course.Rmd")
Note: it will create a file called R4G_course.R but it won’t overwrite it if it already exists. So, if you want some changes to be taken into account, make sure you delete the R file beforehand (e.g., by running the first line in the chunk above).
Just write plain text in the script. You can use different number of stars to indicate:
*italic* -> italic*bold* -> bold**both** -> bothknitr syntaxUse ```{r} to start an R chunk and ``` to close it.
For example if you type 1 + 1 within a chunk, it will be displayed as:
1 + 1
## [1] 2
Then, to evaluate the chunk, just press CTRL-R after having put your mouse cursor inside the chunk (anywhere).
You can learn more about R Markdown on this online book or if you already know a little, the R Studio cheatsheet will help you to keep it all in mind.
R packages readyWe will use the R packages adegenet, poppr and pegas for the analyses and ggplot2, lattice and viridisLite for the plots. If you have already installed before, you can simply load the packages as follow:
library(adegenet)
library(poppr)
library(pegas)
library(ggplot2)
library(lattice)
library(viridisLite)
Note 1: If one of the package is missing, then you must install it BEFORE loading them. So you may need to use, for ex:
install.packages("adegenet")
Note 2: To be able to “knit” the R Markdown, you will also need other R packages but try knitting and R Studio should do the rest for you.
We also add a few extra functions we coded for you:
source("scripts/tools.R")
## [1] "All functions succesfully loaded"
Rgenepop formatWe will be using a very common type of input file for microsatellite data, which was originally developed for a stand-alone program called Genepop. You will learn how to create a genepop file with Jörns tomorrow. Today we will use one that we have made for you.
Here is the basic structure:
comment line
locus-1, locus-2, locus-3, ...
pop
Ind-1 , 100102 135135 204208 ...
Ind-2 , 100100 131139 200208 ...
Ind-3 , 102102 131139 200204 ...
Ind-4 , 000000 135139 208208 ...
...
the first line is a “comment line” - you can keep notes on your project here. E.g. “24 animals, 8 loci”
the second line contains the names of the loci (locus-1, etc).
the third line has a “pop flag” that indicates the next samples belong to the same population.
below this (until the next pop flag), every line is a multi-locus genotype per individual belonging to the population
Is everyone comfortable with the terms locus, allele and genotype?
RIn the folder data in this R Studio project you can find the genepop file called microbov_small.gen.
We will create an R object that contains the data in microbov_small.gen using the read.genepop() function of adegenet. Here we need to provide the path of the file as "/data/microbov_small.gen" using file =, and some information about how microsatellite data is encoded in the file using ncode =. (And here we also set the argument quiet = but you don’t have to, this is just to shorten this document by not displaying some messages.)
myData <- read.genepop(file = "data/microbov_small.gen", ncode = 3, quiet = TRUE)
Why did we use ncode = 3?
Now the data is read into R.
If there had been a problem with the data file (e.g. incorrectly formatted), then we would have got an error message. Let’s try:
badFormat <- read.genepop(file = "data/badFormatting.gen", ncode = 3, quiet = TRUE)
## Error in dimnames(x) <- dn: length of 'dimnames' [1] not equal to array extent
We “broke” this input file by including fewer locus names than data. The error message is telling us that the length of dimnames [1] (= number of locus names) does not correspond to the number of columns containing the data.
As you can see, R provides us a clue about what the problem is.
Another way we can generate an error is by providing wrong arguments to the function, despite having a properly formatted file:
badInput <- read.genepop(file = "data/microbov_small.gen", ncode = 2, quiet = TRUE)
## Error in read.genepop(file = "data/microbov_small.gen", ncode = 2, quiet = TRUE): some alleles are not encoded with 2 characters
## Check 'ncode' argument
Unsurprisingly, there will be an error if you type the path to the file incorrectly.
One way to limit such issues is to list all genepop files in the folder containing the data (to make sure that the file is where you are trying to read it):
dir(path = "data/", pattern = "*.gen")
## [1] "badFormatting.gen" "lynx.gen" "microbov_small.gen" "nancycats.gen"
R representations of your datagenind formatWe haven’t looked at our in data yet!
myData
## /// GENIND OBJECT /////////
##
## // 160 individuals; 15 loci; 143 alleles; size: 130.5 Kb
##
## // Basic content
## @tab: 160 x 143 matrix of allele counts
## @loc.n.all: number of alleles per locus (range: 5-14)
## @loc.fac: locus factor for the 143 columns of @tab
## @all.names: list of allele names for each locus
## @ploidy: ploidy of each individual (range: 2-2)
## @type: codom
## @call: read.genepop(file = "data/microbov_small.gen", ncode = 3, quiet = TRUE)
##
## // Optional content
## @pop: population of each individual (group size range: 15-31)
Geeky note: For more details on the content of the genind object, just try:
str(myData)
## Formal class 'genind' [package "adegenet"] with 11 slots
## ..@ tab : int [1:160, 1:143] 2 1 1 2 1 1 0 2 1 2 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:160] "AFBIBOR9503" "AFBIBOR9504" "AFBIBOR9505" "AFBIBOR9506" ...
## .. .. ..$ : chr [1:143] "INRA63.183" "INRA63.181" "INRA63.177" "INRA63.175" ...
## ..@ loc.fac : Factor w/ 15 levels "INRA63","INRA5",..: 1 1 1 1 1 1 1 2 2 2 ...
## ..@ loc.n.all: Named int [1:15] 7 5 12 5 10 9 7 7 12 9 ...
## .. ..- attr(*, "names")= chr [1:15] "INRA63" "INRA5" "ETH225" "ILSTS5" ...
## ..@ all.names:List of 15
## .. ..$ INRA63: chr [1:7] "183" "181" "177" "175" ...
## .. ..$ INRA5 : chr [1:5] "137" "141" "143" "139" ...
## .. ..$ ETH225: chr [1:12] "147" "157" "139" "141" ...
## .. ..$ ILSTS5: chr [1:5] "190" "186" "194" "184" ...
## .. ..$ HEL5 : chr [1:10] "149" "151" "163" "165" ...
## .. ..$ HEL1 : chr [1:9] "103" "109" "105" "107" ...
## .. ..$ INRA35: chr [1:7] "102" "104" "120" "110" ...
## .. ..$ ETH152: chr [1:7] "191" "195" "197" "193" ...
## .. ..$ INRA23: chr [1:12] "213" "215" "197" "199" ...
## .. ..$ ETH10 : chr [1:9] "209" "211" "207" "217" ...
## .. ..$ HEL9 : chr [1:11] "153" "159" "165" "155" ...
## .. ..$ CSSM66: chr [1:12] "181" "189" "185" "193" ...
## .. ..$ INRA32: chr [1:12] "162" "178" "180" "184" ...
## .. ..$ ETH3 : chr [1:11] "117" "125" "115" "127" ...
## .. ..$ BM2113: chr [1:14] "140" "142" "122" "134" ...
## ..@ ploidy : int [1:160] 2 2 2 2 2 2 2 2 2 2 ...
## ..@ type : chr "codom"
## ..@ other : NULL
## ..@ call : language read.genepop(file = "data/microbov_small.gen", ncode = 3, quiet = TRUE)
## ..@ pop : Factor w/ 7 levels "AFBIBOR9522",..: 1 1 1 1 1 1 1 1 1 1 ...
## ..@ strata : NULL
## ..@ hierarchy: NULL
or for even more geeky details:
print.AsIs(myData)
As you will see, the genind object is a particular kind of list. Technically it is called a S4 object. Instead of accessing elements with $, when using S4 objects elements (called slots) are typically accessible with @ (even if the programmer behind adegenet made it possible to use $ anyway).
Since manipulating such objects is a little complicated, you should use the accessors, whenever available:
nInd(myData) # Number of individuals
## [1] 160
nLoc(myData) # Number of loci
## [1] 15
nPop(myData) # Number of populations
## [1] 7
locNames(myData) # Names of loci
## [1] "INRA63" "INRA5" "ETH225" "ILSTS5" "HEL5" "HEL1" "INRA35" "ETH152" "INRA23" "ETH10" "HEL9" "CSSM66" "INRA32" "ETH3" "BM2113"
alleles(myData) # List of all alleles
## $INRA63
## [1] "183" "181" "177" "175" "179" "185" "171"
##
## $INRA5
## [1] "137" "141" "143" "139" "149"
##
## $ETH225
## [1] "147" "157" "139" "141" "153" "149" "155" "143" "159" "151" "137" "145"
##
## $ILSTS5
## [1] "190" "186" "194" "184" "182"
##
## $HEL5
## [1] "149" "151" "163" "165" "167" "155" "157" "153" "161" "159"
##
## $HEL1
## [1] "103" "109" "105" "107" "101" "117" "113" "111" "115"
##
## $INRA35
## [1] "102" "104" "120" "110" "108" "114" "106"
##
## $ETH152
## [1] "191" "195" "197" "193" "201" "199" "205"
##
## $INRA23
## [1] "213" "215" "197" "199" "209" "205" "211" "203" "207" "217" "201" "193"
##
## $ETH10
## [1] "209" "211" "207" "217" "219" "221" "215" "223" "213"
##
## $HEL9
## [1] "153" "159" "165" "155" "161" "163" "167" "157" "149" "169" "151"
##
## $CSSM66
## [1] "181" "189" "185" "193" "197" "183" "199" "187" "179" "195" "171" "191"
##
## $INRA32
## [1] "162" "178" "180" "184" "202" "182" "176" "174" "164" "160" "204" "168"
##
## $ETH3
## [1] "117" "125" "115" "127" "103" "123" "129" "131" "119" "109" "113"
##
## $BM2113
## [1] "140" "142" "122" "134" "136" "146" "138" "130" "124" "150" "128" "132" "126" "144"
nAll(myData, onlyObserved = TRUE) # Number of alleles
## INRA63 INRA5 ETH225 ILSTS5 HEL5 HEL1 INRA35 ETH152 INRA23 ETH10 HEL9 CSSM66 INRA32 ETH3 BM2113
## 7 5 12 5 10 9 7 7 12 9 11 12 12 11 14
indNames(myData) # Names of individuals
## [1] "AFBIBOR9503" "AFBIBOR9504" "AFBIBOR9505" "AFBIBOR9506" "AFBIBOR9507" "AFBIBOR9508" "AFBIBOR9509" "AFBIBOR9510" "AFBIBOR9511" "AFBIBOR9512"
## [11] "AFBIBOR9513" "AFBIBOR9514" "AFBIBOR9515" "AFBIBOR9516" "AFBIBOR9517" "AFBIBOR9518" "AFBIBOR9519" "AFBIBOR9520" "AFBIBOR9523" "AFBIBOR9521"
## [21] "AFBIBOR9522" "AFBIZEB9453" "AFBIZEB9454" "AFBIZEB9455" "AFBIZEB9456" "AFBIZEB9457" "AFBIZEB9458" "AFBIZEB9459" "AFBIZEB9460" "AFBIZEB9461"
## [31] "AFBIZEB9462" "AFBIZEB9463" "AFBIZEB9464" "AFBIZEB9465" "AFBIZEB9466" "AFBIZEB9467" "AFBIZEB9468" "AFBIZEB9469" "AFBIZEB9470" "AFBIZEB9471"
## [41] "AFBIZEB9472" "AFBIZEB9473" "AFBIZEB9474" "AFBIZEB9475" "AFBIZEB9476" "AFBIZEB9477" "AFBIZEB9478" "AFBIZEB9479" "AFBIZEB9480" "AFBIZEB9481"
## [51] "AFBIZEB9482" "AFBIZEB9483" "AFBTLAG9402" "AFBTLAG9403" "AFBTLAG9404" "AFBTLAG9405" "AFBTLAG9406" "AFBTLAG9407" "AFBTLAG9408" "AFBTLAG9409"
## [61] "AFBTLAG9410" "AFBTLAG9411" "AFBTLAG9412" "AFBTLAG9413" "AFBTLAG9414" "AFBTLAG9415" "AFBTLAG9416" "AFBTND202" "AFBTND205" "AFBTND206"
## [71] "AFBTND207" "AFBTND208" "AFBTND209" "AFBTND211" "AFBTND212" "AFBTND213" "AFBTND214" "AFBTND215" "AFBTND216" "AFBTND217"
## [81] "AFBTND221" "AFBTND222" "AFBTND223" "AFBTND233" "AFBTND241" "AFBTND242" "AFBTND244" "AFBTND248" "AFBTND253" "AFBTND254"
## [91] "AFBTND255" "AFBTND257" "AFBTND258" "AFBTND259" "AFBTND284" "AFBTND285" "AFBTND292" "AFBTSOM9352" "AFBTSOM9353" "AFBTSOM9354"
## [101] "AFBTSOM9355" "AFBTSOM9356" "AFBTSOM9357" "AFBTSOM9358" "AFBTSOM9359" "AFBTSOM9360" "AFBTSOM9361" "AFBTSOM9362" "AFBTSOM9363" "AFBTSOM9364"
## [111] "AFBTSOM9365" "AFBTSOM9366" "AFBTSOM9367" "AFBTSOM9368" "AFBTSOM9369" "AFBTSOM9370" "AFBTSOM9371" "AFBTSOM9372" "AFBTSOM9373" "AFBTSOM9374"
## [121] "FRBTAUB9061" "FRBTAUB9062" "FRBTAUB9063" "FRBTAUB9064" "FRBTAUB9065" "FRBTAUB9066" "FRBTAUB9067" "FRBTAUB9068" "FRBTAUB9069" "FRBTAUB9070"
## [131] "FRBTAUB9071" "FRBTAUB9072" "FRBTAUB9073" "FRBTAUB9074" "FRBTAUB9075" "FRBTAUB9076" "FRBTBAZ15654" "FRBTBAZ15655" "FRBTBAZ25576" "FRBTBAZ25578"
## [141] "FRBTBAZ25950" "FRBTBAZ25954" "FRBTBAZ25956" "FRBTBAZ25957" "FRBTBAZ26078" "FRBTBAZ26092" "FRBTBAZ26097" "FRBTBAZ26110" "FRBTBAZ26112" "FRBTBAZ26352"
## [151] "FRBTBAZ26354" "FRBTBAZ26375" "FRBTBAZ26386" "FRBTBAZ26388" "FRBTBAZ26396" "FRBTBAZ26400" "FRBTBAZ26401" "FRBTBAZ26403" "FRBTBAZ26439" "FRBTBAZ26457"
popNames(myData) # Name of the last individual in each population
## [1] "AFBIBOR9522" "AFBIZEB9483" "AFBTLAG9416" "AFBTND292" "AFBTSOM9374" "FRBTAUB9076" "FRBTBAZ26457"
Some of the accessors can also be used to redefine some information (to handle with great care):
#let's give the pops some names:
#myPops <- c("Pop1", "Pop2", "Pop3", "Pop4", "Pop5", "Pop6", "Pop7")
myPops <- paste0("Pop", 1:nPop(myData))
popNames(myData) <- myPops
popNames(myData)
## [1] "Pop1" "Pop2" "Pop3" "Pop4" "Pop5" "Pop6" "Pop7"
pop(myData)
## [1] Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2
## [32] Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop3 Pop3 Pop3 Pop3 Pop3 Pop3 Pop3 Pop3 Pop3 Pop3
## [63] Pop3 Pop3 Pop3 Pop3 Pop3 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4
## [94] Pop4 Pop4 Pop4 Pop4 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop6 Pop6 Pop6 Pop6
## [125] Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7
## [156] Pop7 Pop7 Pop7 Pop7 Pop7
## Levels: Pop1 Pop2 Pop3 Pop4 Pop5 Pop6 Pop7
loci formatNot all R packages dealing with genetics/genomics use genind objects (it would be convenient if they did!). In particular, the package pegas which allows for the computation of many useful metrics relies on an alternative format called loci.
myData_loci <- genind2loci(myData) # or as.loci(myData)
myData_loci
The loci format is more simple than the geneind format (and so what you can do with it is more limited).
Geeky note: A loci object is simply a traditional data.frame (an S3 object) with an additional attribute called "locicol":
str(myData_loci)
## Classes 'loci' and 'data.frame': 160 obs. of 16 variables:
## $ population: Factor w/ 7 levels "Pop1","Pop2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ INRA63 : Factor w/ 18 levels "171/175","175/175",..: 17 16 10 17 10 10 9 17 10 17 ...
## $ INRA5 : Factor w/ 11 levels "137/137","137/139",..: 3 9 9 9 9 4 6 6 6 9 ...
## $ ETH225 : Factor w/ 29 levels "137/147","139/139",..: 19 8 2 10 25 23 19 26 3 28 ...
## $ ILSTS5 : Factor w/ 13 levels "182/184","182/186",..: 11 8 13 6 5 5 6 5 3 8 ...
## $ HEL5 : Factor w/ 28 levels "149/149","151/151",..: 1 6 7 28 17 17 27 26 26 28 ...
## $ HEL1 : Factor w/ 22 levels "101/101","101/103",..: 7 10 4 5 6 5 4 6 5 5 ...
## $ INRA35 : Factor w/ 13 levels "102/102","102/104",..: 2 4 4 4 4 4 4 2 4 4 ...
## $ ETH152 : Factor w/ 14 levels "191/191","191/195",..: 2 7 8 7 1 8 7 8 2 8 ...
## $ INRA23 : Factor w/ 32 levels "193/199","197/199",..: 31 3 4 11 11 4 26 20 19 9 ...
## $ ETH10 : Factor w/ 25 levels "207/207","207/209",..: 8 5 3 12 8 11 7 11 6 6 ...
## $ HEL9 : Factor w/ 29 levels "149/157","151/161",..: 3 6 9 14 6 18 9 4 4 23 ...
## $ CSSM66 : Factor w/ 34 levels "171/183","179/179",..: 10 10 8 33 24 18 22 20 20 20 ...
## $ INRA32 : Factor w/ 27 levels "160/204","162/162",..: 2 2 19 2 6 27 19 21 5 7 ...
## $ ETH3 : Factor w/ 30 levels "103/117","109/117",..: 10 13 8 10 9 10 9 6 25 5 ...
## $ BM2113 : Factor w/ 43 levels "122/122","122/124",..: 41 40 1 41 30 41 28 30 7 37 ...
## - attr(*, "locicol")= int [1:15] 2 3 4 5 6 7 8 9 10 11 ...
head(data.frame(myData_loci), n = 10) # first 10 rows
attr(myData_loci, "locicol") # columns that contain loci
## [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Let’s check some basic things about the data in myData.
An important consideration for analysis is the amount of missing data in our dataset. Several types of analyses don’t cope very well with missing data. In some cases this may lead to poor estimates, in other cases it may lead to samples or loci not being considered in the analysis.
We can use info_table() from poppr to find out about missing data, and also generate a convenient plot by including plot = TRUE.
missing <- info_table(myData, df = TRUE) # df = TRUE uses a long rather than
# wide format for the df
missing[order(missing$Missing, decreasing = TRUE), ] # We reorder the table to show the highest missing on top
info_table(myData, plot = TRUE, low = "white")
## Locus
## Population INRA63 INRA5 ETH225 ILSTS5 HEL5 HEL1 INRA35 ETH152 INRA23 ETH10 HEL9 CSSM66 INRA32 ETH3 BM2113 Mean
## Pop1 . . . 0.0476 . . . . . . . . . . . 0.0032
## Pop2 . . . . 0.0323 . . . 0.0645 . . . 0.0323 . . 0.0086
## Pop3 . . . 0.3333 . . . 0.0667 . . . . . . . 0.0267
## Pop4 . . . 0.1333 . 0.0333 0.0333 . 0.0333 . . . . . . 0.0156
## Pop5 . . 0.0435 0.1739 . . . . . . . . . 0.0435 . 0.0174
## Pop6 . . . 0.2500 . . . . . . . . . . . 0.0167
## Pop7 0.1667 0.2917 . 0.2500 0.0833 0.2083 . 0.2917 0.0417 0.3333 0.0417 0.2917 0.3750 0.3750 0.2083 0.1972
## Total 0.0250 0.0437 0.0063 0.1500 0.0187 0.0375 0.0063 0.0500 0.0250 0.0500 0.0063 0.0437 0.0625 0.0625 0.0312 0.0413
Geeky note: it is tricky but you can modify anything in this plot even when info_table() as no option for it:
theplot <- ggplot_build(last_plot())
theplot$data[[4]]$size <- 3
theplot$data[[4]]$angle <- 45
plot(ggplot_gtable(theplot))
We can see that there is missing data at several loci, and that this is particularly bad for our population Pop7.
As a general rule of thumb, we try to keep missing data below 5% in order to minimize the impact on analyses.
How do you think we can achieve this?
Using genotype_curve() we can check how many loci we need in order to discriminate our genotypes. This poppr function randomly samples loci without replacement and counts the number of multilocus genotypes observed.
gencurv <- genotype_curve(myData)
How many loci do we need in order to discriminate our genotypes?
Geeky note: you can get the exact numbers:
apply(gencurv, 2, range)
## NumLoci
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14
## [1,] 12 44 84 135 150 154 155 156 157 157 157 157 157 157
## [2,] 44 131 154 157 157 157 157 157 157 157 157 157 157 157
As missing data is concentrated in a few loci and populations, rather than being thinly spread throughout the dataset, we can remove the worst offenders: a locus (ILSTS5) and a population (Pop7).
We can ‘drop’ a population from the genind object using the following code, if we know the number (ID) of the population in the list of populations.
myData[pop = -c(7), drop = TRUE] # drop = TRUE updates the number of remaining alleles
Sometimes it is more useful to be able to remove populations by their name. For this we need a bit more code:
removePop <- c("Pop7")
myData <- myData[pop = !popNames(myData) %in% removePop, drop = TRUE]
Removing loci from a genind object works in a similar fashion. If you know the number (ID) of a locus, then you can use the short code myData[loc = -c(1)]. But again, it is better to use names.
removeLoc <- c("ILSTS5")
myData <- myData[loc = !locNames(myData) %in% removeLoc, drop = TRUE]
Now let’s check the new version of myData:
myData
## /// GENIND OBJECT /////////
##
## // 136 individuals; 14 loci; 137 alleles; size: 111.1 Kb
##
## // Basic content
## @tab: 136 x 137 matrix of allele counts
## @loc.n.all: number of alleles per locus (range: 5-14)
## @loc.fac: locus factor for the 137 columns of @tab
## @all.names: list of allele names for each locus
## @ploidy: ploidy of each individual (range: 2-2)
## @type: codom
## @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop)
##
## // Optional content
## @pop: population of each individual (group size range: 15-31)
LEt’s also rerun info_table() to see what’s changed:
info_table(myData, plot = TRUE, low = "white")
## Locus
## Population INRA63 INRA5 ETH225 HEL5 HEL1 INRA35 ETH152 INRA23 ETH10 HEL9 CSSM66 INRA32 ETH3 BM2113 Mean
## Pop1 . . . . . . . . . . . . . . .
## Pop2 . . . 0.0323 . . . 0.0645 . . . 0.0323 . . 0.0092
## Pop3 . . . . . . 0.0667 . . . . . . . 0.0048
## Pop4 . . . . 0.0333 0.0333 . 0.0333 . . . . . . 0.0071
## Pop5 . . 0.0435 . . . . . . . . . 0.0435 . 0.0062
## Pop6 . . . . . . . . . . . . . . .
## Total . . 0.0074 0.0074 0.0074 0.0074 0.0074 0.0221 . . . 0.0074 0.0074 . 0.0053
We have significantly reduced the amount of missing data in myData.
How else could we have solved this?
How about our ability to discriminate genotypes?
genotype_curve(myData)
How does that compare to the previous genotype curve?
How does the maximum value for MLG compare to the number of individuals?
Depending on how we obtain our genetic samples, we may not know if we have genotyped the same individual more than once.
For example, if we have been genotyping samples collected in a non-invasive way (e.g. hair or faeces), then we don’t really have a way of knowing if we have obtained more than one sample per individual.
There is an R package called alleleMatch that can investigate this in detail. However, it is no longer maintained. Thankfully, poppr has some basic functionality to examine this. Using mlg() we can check for the number of unique multilocus genotypes in myData.
mlg(myData)
## #############################
## # Number of Individuals: 136
## # Number of MLG: 133
## #############################
## [1] 133
We have 136 samples, but only 133 unique multilocus genotypes. This suggests that 3 samples correspond to replicates from the same individual(s). (We introduced that in the data on purpose to show you how to deal with such issues.)
Note: the mlg() function did not tell us which samples are the same. For this we can use mlg.id(). But this gives us a very long list, because it also includes the genotypes that occur only once. Let us check the beginning of the output from mlg.id() using head():
head(mlg.id(myData))
## $`1`
## [1] "AFBIZEB9466"
##
## $`2`
## [1] "FRBTAUB9073"
##
## $`3`
## [1] "AFBTSOM9356"
##
## $`4`
## [1] "AFBIZEB9471"
##
## $`5`
## [1] "FRBTAUB9067"
##
## $`6`
## [1] "FRBTAUB9064"
What we want to know is which are the genotypes that occur more than once?
We achieve this by looking for entries in the list with a length greater than one.
myMLG <- mlg.id(myData)
myMLG[lengths(myMLG) > 1]
## $`104`
## [1] "AFBIZEB9482" "AFBIZEB9483"
##
## $`120`
## [1] "AFBIBOR9520" "AFBIBOR9523"
##
## $`125`
## [1] "AFBTSOM9373" "AFBTSOM9374"
As we do not want to include replicates of samples in our data, we should now remove one of each replicated genotype. We will remove AFBIZEB9483, AFBIBOR9523, and AFBTSOM9374
myData <- myData[!indNames(myData) %in% c("AFBIZEB9483", "AFBIBOR9523","AFBTSOM9374"), drop = TRUE]
myData
## /// GENIND OBJECT /////////
##
## // 133 individuals; 14 loci; 137 alleles; size: 109.1 Kb
##
## // Basic content
## @tab: 133 x 137 matrix of allele counts
## @loc.n.all: number of alleles per locus (range: 5-14)
## @loc.fac: locus factor for the 137 columns of @tab
## @all.names: list of allele names for each locus
## @ploidy: ploidy of each individual (range: 2-2)
## @type: codom
## @call: .local(x = x, i = i, j = j, drop = drop)
##
## // Optional content
## @pop: population of each individual (group size range: 15-30)
Now we have 133 unique genotypes in myData.
If this were data from non-invasively collected samples, we may expect there to be more duplicate genotypes that we have not detected yet. Why?
Geeky note:
If you have to remove a lot of samples, you can do it easily by replacing the previous call by:
samplesToKeep <- unlist(lapply(myMLG, function(x) x[1]))
myData <- myData[indNames(myData) %in% samplesToKeep, drop = TRUE]
info_table() can also be used to find out about the ploidy of your data. As the results are presented for all samples, here just the first couple lines of output (achieved by using head())
head(info_table(myData, type = "ploidy"), n = 2)
## Loci
## Samples INRA63 INRA5 ETH225 HEL5 HEL1 INRA35 ETH152 INRA23 ETH10 HEL9 CSSM66 INRA32 ETH3 BM2113
## AFBIBOR9503 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## AFBIBOR9504 2 2 2 2 2 2 2 2 2 2 2 2 2 2
table(info_table(myData, type = "ploidy"), useNA = "always") ## counts different values
##
## 2 <NA>
## 1852 10
The numbers make sense:
nLoc(myData) * nInd(myData)
## [1] 1862
How many alleles expect in a tetraploid?
Could this number be 1 for a diploid?
Another important consideration is if our loci are variable and informative. Otherwise we’re wasting time and space on keeping them! The function informloci() can be used to detect uninformative loci and remove them.
informloci(myData)
## cutoff value: 1.50375939849624 % ( 2 samples ).
## MAF : 0.01
##
## All sites polymorphic
## /// GENIND OBJECT /////////
##
## // 133 individuals; 14 loci; 137 alleles; size: 109.2 Kb
##
## // Basic content
## @tab: 133 x 137 matrix of allele counts
## @loc.n.all: number of alleles per locus (range: 5-14)
## @loc.fac: locus factor for the 137 columns of @tab
## @all.names: list of allele names for each locus
## @ploidy: ploidy of each individual (range: 2-2)
## @type: codom
## @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop)
##
## // Optional content
## @pop: population of each individual (group size range: 15-30)
In our case, all loci are informative and they are kept. If we had uninformative loci, and wanted to remove them, we would have run the following.
myData <- informloci(myData)
You can easily explore genetic differences between individuals – within each population – using our home made little function:
pairwise_similarity(myData, pop = "Pop1")
We can represent the results visually:
lapply(unique(myData@pop), function(pop) hist(pairwise_similarity(myData, pop = pop, as_vector = TRUE), xlim = c(0, 1))) # don't forget as_vector = TRUE
How would you do the same thing across all populations?
Consider the following two populations with different genotype frequencies:
AA Aa aa Pop1 50 0 50 Pop2 25 50 25
Do the genotype frequencies of the two populations differ?
Do the allele frequencies of the two populations differ?
Genotype frequencies are the same in males and females.
Individuals mate at random with respect to their genotype at this particular locus (panmixia).
Meiosis is fair.
There is no new genetic material (no mutation).
There is no gene flow (no migration).
The population is of infinite size (no drift).
All matings produce the same average number of offspring (no selection on fecundity).
There are no differences among genotypes in the probability of survival (no selection on survival).
Generations do not overlap (no selection on reproductive rate).
Using frequency of alleles in the current generation …
\(p_t = f(A)\)
\(q_t = f(a)\)
… the frequencies in the next generation can be calculated:
\(f(AA) = p_t^{2}\)
\(f(Aa) = 2p_tq_t\)
\(f(aa) = q_t^{2}\)
\(p_{t+1} = f(AA) + \dfrac{f(Aa)}{2} = p_t^2 + \dfrac{2p_tq_t}{2} = p_t^2 + p_tq_t = p_t^2 + p_t(1-p_t) = p_t^2 + p_t - p_t^2 = p_t\)
\(q_{t+1} = f(aa) + \dfrac{f(Aa)}{2} = q_t^2 + \dfrac{2p_tq_t}{2} = q_t^2 + p_tq_t = q_t^2 + (1-q_t)q_t = q_t^2 + q_t - q_t^2 = q_t\)
Under HWE, the allelic frequencies remain the same over time!
Evolution is a change in allele frequencies in a population over time
When a locus in a population is in HWE, it is not evolving: allele frequencies will stay the same across generations.
If HWE assumptions are not met, evolution can happen (allele frequencies may change).
Mutation, non-random mating, gene flow, genetic drift (caused by finite population size), and natural selection violate HWE assumptions and are thus “mechanisms” by which evolution may proceed.
The HWE is thus the null model of micro-evolution.
So is it good/bad for a locus to be in HWE?
Very broadly speaking, there are two types of population genetics analyses you can do:
those that assume loci are in HWE, and
those that do not.
It is thus important to realise if your data reject the HWE assumptions!
With hw.test() from pegas we can test for HWE across our populations:
hw.test(myData, B = 0) ## for Monte Carlo test instead of asymptotic,
## chi^2 df Pr(chi^2 >)
## INRA63 88.42099 21 3.017570e-10
## INRA5 42.19441 10 6.924324e-06
## ETH225 124.64277 66 1.745822e-05
## HEL5 180.45114 45 0.000000e+00
## HEL1 184.99600 28 0.000000e+00
## INRA35 60.39321 21 1.113412e-05
## ETH152 47.09923 21 9.106719e-04
## INRA23 77.80636 66 1.516741e-01
## ETH10 70.74219 36 4.795813e-04
## HEL9 130.20509 55 4.867201e-08
## CSSM66 82.65631 66 8.075179e-02
## INRA32 209.96734 66 1.110223e-16
## ETH3 44.98197 55 8.303643e-01
## BM2113 184.20731 91 2.747856e-08
## use large value of B (>= 1000)
Here we have conducted a \(\chi^{2}\)-test based on observed and expected genotype frequencies calculated from the allele frequencies.
Would we expect loci to be in HWE if we consider all populations together as we have done?
The Wahlund effect refers to the reduction in the number of heterozygotes due to population structure.
Consider two populations:
AA Aa aa Pop1 50 0 0 Pop2 0 0 50
Here, each population at the HWE.
However, if we treat them as a single population, there are no heterozygotes, and this merged-population is not in HWE.
Let’s check by population. We will split myData by population using the seppop function. This creates a list of genind objects, with every entry in the list consisting of a population.
To apply the HWE test with hw.test() to every item in the list (ie every population), we use (as above) the function lapply().
lapply(seppop(myData), hw.test)
## $Pop1
## chi^2 df Pr(chi^2 >) Pr.exact
## INRA63 0.8333333 6 0.99114998 1.000
## INRA5 12.9166667 6 0.04437854 0.006
## ETH225 21.4390432 36 0.97397104 0.870
## HEL5 37.7654321 28 0.10291932 0.237
## HEL1 16.0969388 15 0.37563846 0.404
## INRA35 10.0888889 6 0.12095828 0.229
## ETH152 10.2835732 10 0.41597667 0.105
## INRA23 19.0958270 21 0.57899284 0.207
## ETH10 13.9506173 21 0.87171109 0.814
## HEL9 26.7469136 28 0.53205905 0.747
## CSSM66 29.6798155 15 0.01312988 0.186
## INRA32 48.1087311 36 0.08546174 0.079
## ETH3 3.8776014 10 0.95269969 0.965
## BM2113 18.1530612 21 0.63929854 0.584
##
## $Pop2
## chi^2 df Pr(chi^2 >) Pr.exact
## INRA63 45.692857 21 1.403507e-03 0.000
## INRA5 19.552921 10 3.377642e-02 0.024
## ETH225 54.411111 28 1.999921e-03 0.000
## HEL5 59.525763 21 1.506288e-05 0.000
## HEL1 3.949720 10 9.495867e-01 0.868
## INRA35 13.606771 10 1.916952e-01 0.036
## ETH152 11.280563 10 3.360814e-01 0.063
## INRA23 9.775034 21 9.816874e-01 0.753
## ETH10 13.687899 15 5.493191e-01 0.230
## HEL9 52.909954 45 1.952332e-01 0.183
## CSSM66 27.401167 36 8.478470e-01 0.592
## INRA32 87.022377 36 4.114494e-06 0.120
## ETH3 11.089471 15 7.462265e-01 0.501
## BM2113 19.367498 28 8.864145e-01 0.893
##
## $Pop3
## chi^2 df Pr(chi^2 >) Pr.exact
## INRA63 5.94687500 10 0.81970664 0.561
## INRA5 7.60416667 6 0.26856035 0.106
## ETH225 7.54933378 3 0.05630431 0.073
## HEL5 9.13333333 10 0.51949831 0.181
## HEL1 0.60000000 1 0.43857803 1.000
## INRA35 3.30740741 3 0.34661300 0.161
## ETH152 3.62388427 1 0.05695575 0.065
## INRA23 0.80740741 3 0.84769447 1.000
## ETH10 0.80740741 3 0.84769447 1.000
## HEL9 2.76962525 6 0.83715593 0.821
## CSSM66 1.51859504 1 0.21783223 0.236
## INRA32 0.01783591 1 0.89375751 1.000
## ETH3 11.76480716 10 0.30110547 0.349
## BM2113 5.93357821 10 0.82081249 0.886
##
## $Pop4
## chi^2 df Pr(chi^2 >) Pr.exact
## INRA63 37.158133 10 5.313702e-05 0.040
## INRA5 1.228844 3 7.460949e-01 0.496
## ETH225 29.540706 21 1.016135e-01 0.089
## HEL5 32.331967 28 2.612127e-01 0.370
## HEL1 57.190924 21 3.366698e-05 0.000
## INRA35 3.473136 3 3.242630e-01 0.267
## ETH152 2.228763 6 8.975031e-01 0.756
## INRA23 27.404740 21 1.578545e-01 0.081
## ETH10 20.406585 15 1.568841e-01 0.001
## HEL9 23.035714 15 8.338452e-02 0.007
## CSSM66 18.778597 36 9.920040e-01 0.376
## INRA32 7.384172 3 6.061047e-02 0.038
## ETH3 34.353741 10 1.608659e-04 0.041
## BM2113 13.117038 21 9.044692e-01 0.707
##
## $Pop5
## chi^2 df Pr(chi^2 >) Pr.exact
## INRA63 13.791312 15 0.5414118 0.529
## INRA5 7.588642 6 0.2698152 0.051
## ETH225 25.176667 21 0.2395936 0.140
## HEL5 4.665958 10 0.9123471 0.923
## HEL1 1.454694 3 0.6927657 1.000
## INRA35 2.296331 1 0.1296800 0.234
## ETH152 1.845937 6 0.9333083 0.854
## INRA23 16.209393 15 0.3682734 0.194
## ETH10 11.739444 15 0.6986344 0.232
## HEL9 40.562233 36 0.2761318 0.352
## CSSM66 21.933673 21 0.4033407 0.247
## INRA32 4.361975 10 0.9295439 0.498
## ETH3 32.563125 28 0.2522038 0.073
## BM2113 12.740575 10 0.2385404 0.141
##
## $Pop6
## chi^2 df Pr(chi^2 >) Pr.exact
## INRA63 0.117551 1 7.317059e-01 1.000
## INRA5 2.346667 3 5.036398e-01 0.653
## ETH225 9.120676 10 5.206906e-01 0.355
## HEL5 8.699421 28 9.998207e-01 0.991
## HEL1 34.147448 10 1.743706e-04 0.107
## INRA35 34.863905 6 4.579186e-06 0.013
## ETH152 3.484444 6 7.460381e-01 0.340
## INRA23 18.391038 15 2.426658e-01 0.095
## ETH10 23.278571 15 7.840172e-02 0.184
## HEL9 30.950400 15 8.920097e-03 0.112
## CSSM66 31.604938 36 6.776680e-01 0.763
## INRA32 3.998186 6 6.769219e-01 0.742
## ETH3 6.064198 21 9.993764e-01 1.000
## BM2113 35.235556 36 5.047583e-01 0.142
Be careful during your interpretation: the population sizes differ and so does the statical power!
unlist(lapply(seppop(myData), nInd))
## Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## 20 30 15 30 22 16
Some loci, in some populations, are still not in HWE.
Why may this be?
Another important consideration for analysing genetic data is the independence of loci.
Imagine two loci that are physically very close on the same chromosome. Will alleles at these loci be segregating independently?
Do you know how this is measured?
For this reason, and many others (e.g. Fisher’s runaway), it is possible that alleles at one locus may be associated with alleles at another locus. In other words, we may observe non-random association of alleles at different loci. This is called linkage disequilibrium.
Why is this important? If the dependence between loci is not recognised, the results will be biased.
We can look at this with the ia() function of poppr. This function calculates an index of association over all loci in the genind object:
ia(myData, sample = 999)
## Ia p.Ia rbarD p.rD
## 0.84836718 0.00100000 0.06567056 0.00100000
Here we can see the distribution of expected association between alleles at different loci when there is no linkage (dark grey barplot), and the estimate for association among alleles in our total dataset (i.e., all loci and all pops at the same time).
It appears we have a significant association across all populations.
What if we look at the same test per population? To answer this question, we use seppop() and lapply() as before. But now we include ia() and for the argument sample, which defines the number of permutations used to draw the distribution of the association index under the null hypothesis, we want a large number, e.g., 999:
lapply(seppop(myData), ia, sample = 999)
## $Pop1
## Ia p.Ia rbarD p.rD
## 0.25067155 0.03900000 0.01941745 0.03900000
##
## $Pop2
## Ia p.Ia rbarD p.rD
## 0.018941131 0.499000000 0.001466489 0.499000000
##
## $Pop3
## Ia p.Ia rbarD p.rD
## 0.68851067 0.00100000 0.05471501 0.00100000
##
## $Pop4
## Ia p.Ia rbarD p.rD
## -0.018113328 0.510000000 -0.001413501 0.510000000
##
## $Pop5
## Ia p.Ia rbarD p.rD
## 0.46856847 0.00600000 0.03641272 0.00600000
##
## $Pop6
## Ia p.Ia rbarD p.rD
## -0.29912511 0.97300000 -0.02351494 0.97400000
In some populations we observe significant LD.
Is this important?
If we want to know which two loci are in LD, we can use the pair.ia() function. Let’s consider Pop6 using seppop(myData)[["Pop6"]].
pair.ia(seppop(myData)[["Pop6"]])
## Ia rbarD
## INRA63:INRA5 -2.5e-03 -2.5e-03
## INRA63:ETH225 -5.4e-02 -5.4e-02
## INRA63:HEL5 -2.4e-01 -2.4e-01
## INRA63:HEL1 -1.4e-01 -1.4e-01
## INRA63:INRA35 -9.3e-02 -9.6e-02
## INRA63:ETH152 -4.8e-02 -4.8e-02
## INRA63:INRA23 -1.1e-02 -1.1e-02
## INRA63:ETH10 8.5e-02 8.6e-02
## INRA63:HEL9 5.7e-02 5.8e-02
## INRA63:CSSM66 1.2e-01 1.2e-01
## INRA63:INRA32 1.5e-01 1.5e-01
## INRA63:ETH3 -1.0e-01 -1.1e-01
## INRA63:BM2113 -1.5e-01 -1.5e-01
## INRA5:ETH225 -2.4e-01 -2.4e-01
## INRA5:HEL5 -1.4e-01 -1.5e-01
## INRA5:HEL1 -2.1e-01 -2.1e-01
## INRA5:INRA35 2.0e-02 2.1e-02
## INRA5:ETH152 6.4e-02 6.4e-02
## INRA5:INRA23 1.1e-02 1.1e-02
## INRA5:ETH10 -5.6e-03 -5.6e-03
## INRA5:HEL9 -2.0e-01 -2.0e-01
## INRA5:CSSM66 3.1e-03 3.1e-03
## INRA5:INRA32 -7.0e-02 -7.0e-02
## INRA5:ETH3 -1.6e-01 -1.6e-01
## INRA5:BM2113 -9.9e-02 -9.9e-02
## ETH225:HEL5 8.5e-03 8.6e-03
## ETH225:HEL1 3.0e-01 3.0e-01
## ETH225:INRA35 1.2e-01 1.3e-01
## ETH225:ETH152 -1.7e-01 -1.7e-01
## ETH225:INRA23 -2.3e-02 -2.3e-02
## ETH225:ETH10 -5.2e-02 -5.2e-02
## ETH225:HEL9 -2.2e-02 -2.3e-02
## ETH225:CSSM66 -2.6e-01 -2.6e-01
## ETH225:INRA32 -2.4e-02 -2.4e-02
## ETH225:ETH3 2.3e-01 2.4e-01
## ETH225:BM2113 3.6e-02 3.6e-02
## HEL5:HEL1 1.7e-01 1.8e-01
## HEL5:INRA35 -1.5e-01 -1.7e-01
## HEL5:ETH152 2.8e-02 2.9e-02
## HEL5:INRA23 1.1e-01 1.1e-01
## HEL5:ETH10 1.4e-01 1.4e-01
## HEL5:HEL9 2.2e-01 2.4e-01
## HEL5:CSSM66 5.0e-02 5.0e-02
## HEL5:INRA32 -2.2e-02 -2.2e-02
## HEL5:ETH3 1.7e-02 1.7e-02
## HEL5:BM2113 -8.0e-02 -8.1e-02
## HEL1:INRA35 -2.1e-01 -2.1e-01
## HEL1:ETH152 1.1e-01 1.1e-01
## HEL1:INRA23 -3.3e-16 -3.9e-16
## HEL1:ETH10 1.1e-01 1.2e-01
## HEL1:HEL9 5.5e-02 5.5e-02
## HEL1:CSSM66 -5.2e-02 -5.5e-02
## HEL1:INRA32 -1.6e-02 -1.6e-02
## HEL1:ETH3 4.6e-01 4.9e-01
## HEL1:BM2113 -2.9e-02 -2.9e-02
## INRA35:ETH152 -2.3e-01 -2.3e-01
## INRA35:INRA23 1.1e-01 1.2e-01
## INRA35:ETH10 -2.3e-01 -2.4e-01
## INRA35:HEL9 -2.0e-01 -2.0e-01
## INRA35:CSSM66 -9.2e-02 -9.9e-02
## INRA35:INRA32 -1.8e-01 -2.0e-01
## INRA35:ETH3 -1.5e-01 -1.7e-01
## INRA35:BM2113 -6.7e-02 -7.0e-02
## ETH152:INRA23 -8.2e-02 -8.3e-02
## ETH152:ETH10 1.3e-01 1.3e-01
## ETH152:HEL9 -1.1e-01 -1.1e-01
## ETH152:CSSM66 1.8e-01 1.8e-01
## ETH152:INRA32 -1.2e-01 -1.2e-01
## ETH152:ETH3 1.1e-01 1.2e-01
## ETH152:BM2113 1.6e-01 1.6e-01
## INRA23:ETH10 -1.6e-01 -1.6e-01
## INRA23:HEL9 -1.2e-01 -1.2e-01
## INRA23:CSSM66 -5.1e-02 -5.2e-02
## INRA23:INRA32 2.5e-02 2.5e-02
## INRA23:ETH3 2.8e-02 2.8e-02
## INRA23:BM2113 -1.3e-01 -1.3e-01
## ETH10:HEL9 -2.7e-01 -2.8e-01
## ETH10:CSSM66 4.7e-01 4.8e-01
## ETH10:INRA32 -5.2e-02 -5.2e-02
## ETH10:ETH3 1.7e-01 1.8e-01
## ETH10:BM2113 -1.0e-01 -1.0e-01
## HEL9:CSSM66 -7.1e-02 -7.6e-02
## HEL9:INRA32 4.4e-01 4.7e-01
## HEL9:ETH3 -1.6e-01 -1.7e-01
## HEL9:BM2113 -1.3e-01 -1.3e-01
## CSSM66:INRA32 -1.2e-01 -1.2e-01
## CSSM66:ETH3 -2.2e-02 -2.3e-02
## CSSM66:BM2113 -7.5e-02 -7.6e-02
## INRA32:ETH3 -6.4e-02 -6.5e-02
## INRA32:BM2113 -9.2e-03 -9.3e-03
## ETH3:BM2113 -1.9e-02 -2.0e-02
We can see that LD is not the same for all pairs of loci.
Let us change the plot so that it is easier for colour-blind people to see (e.g. for Dan!). For this we add some additional arguments for colour: low = "black" and high = "green".
pair.ia(myData[pop = "Pop6"], low = "black", high = "green")
## Ia rbarD
## INRA63:INRA5 -2.5e-03 -2.5e-03
## INRA63:ETH225 -5.4e-02 -5.4e-02
## INRA63:HEL5 -2.4e-01 -2.4e-01
## INRA63:HEL1 -1.4e-01 -1.4e-01
## INRA63:INRA35 -9.3e-02 -9.6e-02
## INRA63:ETH152 -4.8e-02 -4.8e-02
## INRA63:INRA23 -1.1e-02 -1.1e-02
## INRA63:ETH10 8.5e-02 8.6e-02
## INRA63:HEL9 5.7e-02 5.8e-02
## INRA63:CSSM66 1.2e-01 1.2e-01
## INRA63:INRA32 1.5e-01 1.5e-01
## INRA63:ETH3 -1.0e-01 -1.1e-01
## INRA63:BM2113 -1.5e-01 -1.5e-01
## INRA5:ETH225 -2.4e-01 -2.4e-01
## INRA5:HEL5 -1.4e-01 -1.5e-01
## INRA5:HEL1 -2.1e-01 -2.1e-01
## INRA5:INRA35 2.0e-02 2.1e-02
## INRA5:ETH152 6.4e-02 6.4e-02
## INRA5:INRA23 1.1e-02 1.1e-02
## INRA5:ETH10 -5.6e-03 -5.6e-03
## INRA5:HEL9 -2.0e-01 -2.0e-01
## INRA5:CSSM66 3.1e-03 3.1e-03
## INRA5:INRA32 -7.0e-02 -7.0e-02
## INRA5:ETH3 -1.6e-01 -1.6e-01
## INRA5:BM2113 -9.9e-02 -9.9e-02
## ETH225:HEL5 8.5e-03 8.6e-03
## ETH225:HEL1 3.0e-01 3.0e-01
## ETH225:INRA35 1.2e-01 1.3e-01
## ETH225:ETH152 -1.7e-01 -1.7e-01
## ETH225:INRA23 -2.3e-02 -2.3e-02
## ETH225:ETH10 -5.2e-02 -5.2e-02
## ETH225:HEL9 -2.2e-02 -2.3e-02
## ETH225:CSSM66 -2.6e-01 -2.6e-01
## ETH225:INRA32 -2.4e-02 -2.4e-02
## ETH225:ETH3 2.3e-01 2.4e-01
## ETH225:BM2113 3.6e-02 3.6e-02
## HEL5:HEL1 1.7e-01 1.8e-01
## HEL5:INRA35 -1.5e-01 -1.7e-01
## HEL5:ETH152 2.8e-02 2.9e-02
## HEL5:INRA23 1.1e-01 1.1e-01
## HEL5:ETH10 1.4e-01 1.4e-01
## HEL5:HEL9 2.2e-01 2.4e-01
## HEL5:CSSM66 5.0e-02 5.0e-02
## HEL5:INRA32 -2.2e-02 -2.2e-02
## HEL5:ETH3 1.7e-02 1.7e-02
## HEL5:BM2113 -8.0e-02 -8.1e-02
## HEL1:INRA35 -2.1e-01 -2.1e-01
## HEL1:ETH152 1.1e-01 1.1e-01
## HEL1:INRA23 -3.3e-16 -3.9e-16
## HEL1:ETH10 1.1e-01 1.2e-01
## HEL1:HEL9 5.5e-02 5.5e-02
## HEL1:CSSM66 -5.2e-02 -5.5e-02
## HEL1:INRA32 -1.6e-02 -1.6e-02
## HEL1:ETH3 4.6e-01 4.9e-01
## HEL1:BM2113 -2.9e-02 -2.9e-02
## INRA35:ETH152 -2.3e-01 -2.3e-01
## INRA35:INRA23 1.1e-01 1.2e-01
## INRA35:ETH10 -2.3e-01 -2.4e-01
## INRA35:HEL9 -2.0e-01 -2.0e-01
## INRA35:CSSM66 -9.2e-02 -9.9e-02
## INRA35:INRA32 -1.8e-01 -2.0e-01
## INRA35:ETH3 -1.5e-01 -1.7e-01
## INRA35:BM2113 -6.7e-02 -7.0e-02
## ETH152:INRA23 -8.2e-02 -8.3e-02
## ETH152:ETH10 1.3e-01 1.3e-01
## ETH152:HEL9 -1.1e-01 -1.1e-01
## ETH152:CSSM66 1.8e-01 1.8e-01
## ETH152:INRA32 -1.2e-01 -1.2e-01
## ETH152:ETH3 1.1e-01 1.2e-01
## ETH152:BM2113 1.6e-01 1.6e-01
## INRA23:ETH10 -1.6e-01 -1.6e-01
## INRA23:HEL9 -1.2e-01 -1.2e-01
## INRA23:CSSM66 -5.1e-02 -5.2e-02
## INRA23:INRA32 2.5e-02 2.5e-02
## INRA23:ETH3 2.8e-02 2.8e-02
## INRA23:BM2113 -1.3e-01 -1.3e-01
## ETH10:HEL9 -2.7e-01 -2.8e-01
## ETH10:CSSM66 4.7e-01 4.8e-01
## ETH10:INRA32 -5.2e-02 -5.2e-02
## ETH10:ETH3 1.7e-01 1.8e-01
## ETH10:BM2113 -1.0e-01 -1.0e-01
## HEL9:CSSM66 -7.1e-02 -7.6e-02
## HEL9:INRA32 4.4e-01 4.7e-01
## HEL9:ETH3 -1.6e-01 -1.7e-01
## HEL9:BM2113 -1.3e-01 -1.3e-01
## CSSM66:INRA32 -1.2e-01 -1.2e-01
## CSSM66:ETH3 -2.2e-02 -2.3e-02
## CSSM66:BM2113 -7.5e-02 -7.6e-02
## INRA32:ETH3 -6.4e-02 -6.5e-02
## INRA32:BM2113 -9.2e-03 -9.3e-03
## ETH3:BM2113 -1.9e-02 -2.0e-02
lapply(seppop(myData), pair.ia, low = "black", high = "green")
## $Pop1
## Ia rbarD
## INRA63:INRA5 -0.1304 -0.1337
## INRA63:ETH225 -0.0460 -0.0463
## INRA63:HEL5 0.1069 0.1070
## INRA63:HEL1 -0.0203 -0.0203
## INRA63:INRA35 -0.0205 -0.0205
## INRA63:ETH152 0.1525 0.1533
## INRA63:INRA23 0.1233 0.1233
## INRA63:ETH10 0.0184 0.0184
## INRA63:HEL9 0.0032 0.0032
## INRA63:CSSM66 0.0682 0.0682
## INRA63:INRA32 -0.0119 -0.0119
## INRA63:ETH3 -0.0217 -0.0217
## INRA63:BM2113 -0.0663 -0.0663
## INRA5:ETH225 -0.1969 -0.2072
## INRA5:HEL5 0.0314 0.0326
## INRA5:HEL1 0.0941 0.0961
## INRA5:INRA35 0.3094 0.3164
## INRA5:ETH152 -0.0499 -0.0502
## INRA5:INRA23 0.1522 0.1555
## INRA5:ETH10 0.0907 0.0943
## INRA5:HEL9 0.0399 0.0413
## INRA5:CSSM66 -0.0181 -0.0185
## INRA5:INRA32 0.1526 0.1567
## INRA5:ETH3 0.0671 0.0694
## INRA5:BM2113 0.0251 0.0258
## ETH225:HEL5 -0.0743 -0.0743
## ETH225:HEL1 0.0208 0.0209
## ETH225:INRA35 -0.1170 -0.1177
## ETH225:ETH152 -0.0151 -0.0154
## ETH225:INRA23 -0.0838 -0.0844
## ETH225:ETH10 0.0216 0.0216
## ETH225:HEL9 0.0216 0.0217
## ETH225:CSSM66 0.0063 0.0064
## ETH225:INRA32 -0.1336 -0.1341
## ETH225:ETH3 -0.0256 -0.0257
## ETH225:BM2113 -0.0343 -0.0345
## HEL5:HEL1 -0.1259 -0.1262
## HEL5:INRA35 0.1188 0.1190
## HEL5:ETH152 -0.1378 -0.1395
## HEL5:INRA23 0.0831 0.0833
## HEL5:ETH10 0.0509 0.0509
## HEL5:HEL9 0.0518 0.0518
## HEL5:CSSM66 0.2606 0.2611
## HEL5:INRA32 -0.0229 -0.0229
## HEL5:ETH3 0.0572 0.0572
## HEL5:BM2113 -0.1012 -0.1012
## HEL1:INRA35 0.3598 0.3598
## HEL1:ETH152 0.1222 0.1227
## HEL1:INRA23 -0.0141 -0.0141
## HEL1:ETH10 -0.0446 -0.0447
## HEL1:HEL9 0.0633 0.0633
## HEL1:CSSM66 -0.1475 -0.1475
## HEL1:INRA32 0.0061 0.0061
## HEL1:ETH3 -0.1091 -0.1093
## HEL1:BM2113 0.1816 0.1817
## INRA35:ETH152 0.0256 0.0257
## INRA35:INRA23 0.0773 0.0773
## INRA35:ETH10 0.0538 0.0539
## INRA35:HEL9 -0.0392 -0.0393
## INRA35:CSSM66 0.0067 0.0067
## INRA35:INRA32 0.0149 0.0149
## INRA35:ETH3 -0.0092 -0.0092
## INRA35:BM2113 0.0263 0.0264
## ETH152:INRA23 0.0833 0.0837
## ETH152:ETH10 0.0396 0.0401
## ETH152:HEL9 0.0556 0.0562
## ETH152:CSSM66 -0.0181 -0.0182
## ETH152:INRA32 0.0207 0.0208
## ETH152:ETH3 -0.0746 -0.0754
## ETH152:BM2113 -0.0916 -0.0923
## INRA23:ETH10 -0.0376 -0.0376
## INRA23:HEL9 0.0227 0.0227
## INRA23:CSSM66 -0.0761 -0.0761
## INRA23:INRA32 0.0655 0.0655
## INRA23:ETH3 0.0031 0.0031
## INRA23:BM2113 0.0244 0.0244
## ETH10:HEL9 0.0132 0.0132
## ETH10:CSSM66 0.1741 0.1744
## ETH10:INRA32 -0.0419 -0.0419
## ETH10:ETH3 -0.0262 -0.0262
## ETH10:BM2113 -0.0141 -0.0141
## HEL9:CSSM66 0.0713 0.0713
## HEL9:INRA32 0.0199 0.0199
## HEL9:ETH3 -0.0273 -0.0273
## HEL9:BM2113 0.0613 0.0613
## CSSM66:INRA32 -0.0051 -0.0051
## CSSM66:ETH3 -0.0200 -0.0200
## CSSM66:BM2113 -0.1347 -0.1347
## INRA32:ETH3 0.0398 0.0399
## INRA32:BM2113 0.0047 0.0047
## ETH3:BM2113 0.2078 0.2079
##
## $Pop2
## Ia rbarD
## INRA63:INRA5 0.00240 0.00241
## INRA63:ETH225 0.09044 0.09047
## INRA63:HEL5 0.09870 0.09897
## INRA63:HEL1 -0.01827 -0.01849
## INRA63:INRA35 -0.09862 -0.09866
## INRA63:ETH152 0.10850 0.10857
## INRA63:INRA23 -0.10702 -0.10712
## INRA63:ETH10 -0.02601 -0.02623
## INRA63:HEL9 0.08028 0.08121
## INRA63:CSSM66 0.03369 0.03425
## INRA63:INRA32 0.11670 0.11671
## INRA63:ETH3 -0.07811 -0.07844
## INRA63:BM2113 0.03009 0.03070
## INRA5:ETH225 -0.04664 -0.04666
## INRA5:HEL5 -0.00561 -0.00566
## INRA5:HEL1 0.05620 0.05646
## INRA5:INRA35 -0.10203 -0.10207
## INRA5:ETH152 -0.08975 -0.08977
## INRA5:INRA23 -0.01836 -0.01836
## INRA5:ETH10 0.06641 0.06660
## INRA5:HEL9 0.03607 0.03624
## INRA5:CSSM66 0.00287 0.00289
## INRA5:INRA32 0.08753 0.08770
## INRA5:ETH3 -0.07607 -0.07612
## INRA5:BM2113 0.16316 0.16486
## ETH225:HEL5 0.03426 0.03444
## ETH225:HEL1 0.12889 0.12991
## ETH225:INRA35 -0.04145 -0.04145
## ETH225:ETH152 0.13305 0.13305
## ETH225:INRA23 -0.28862 -0.28866
## ETH225:ETH10 0.01393 0.01400
## ETH225:HEL9 -0.02948 -0.02970
## ETH225:CSSM66 0.03368 0.03409
## ETH225:INRA32 0.01823 0.01824
## ETH225:ETH3 -0.05315 -0.05326
## ETH225:BM2113 -0.06996 -0.07101
## HEL5:HEL1 -0.01301 -0.01335
## HEL5:INRA35 0.03329 0.03347
## HEL5:ETH152 0.05271 0.05304
## HEL5:INRA23 -0.02632 -0.02651
## HEL5:ETH10 -0.05015 -0.05120
## HEL5:HEL9 0.02768 0.02839
## HEL5:CSSM66 0.16015 0.16550
## HEL5:INRA32 -0.09030 -0.09050
## HEL5:ETH3 -0.10342 -0.10486
## HEL5:BM2113 0.07527 0.07812
## HEL1:INRA35 -0.03216 -0.03242
## HEL1:ETH152 -0.03503 -0.03526
## HEL1:INRA23 -0.33821 -0.34025
## HEL1:ETH10 0.01934 0.01935
## HEL1:HEL9 -0.15429 -0.15429
## HEL1:CSSM66 -0.02219 -0.02221
## HEL1:INRA32 -0.24010 -0.24322
## HEL1:ETH3 0.08153 0.08168
## HEL1:BM2113 0.01823 0.01825
## INRA35:ETH152 -0.02894 -0.02894
## INRA35:INRA23 0.08935 0.08936
## INRA35:ETH10 -0.02801 -0.02816
## INRA35:HEL9 -0.07373 -0.07430
## INRA35:CSSM66 0.01891 0.01914
## INRA35:INRA32 0.01211 0.01212
## INRA35:ETH3 0.09723 0.09743
## INRA35:BM2113 -0.09759 -0.09906
## ETH152:INRA23 0.05224 0.05224
## ETH152:ETH10 0.00915 0.00919
## ETH152:HEL9 -0.00850 -0.00855
## ETH152:CSSM66 -0.05437 -0.05495
## ETH152:INRA32 0.01172 0.01173
## ETH152:ETH3 0.10525 0.10541
## ETH152:BM2113 -0.01431 -0.01450
## INRA23:ETH10 0.00062 0.00062
## INRA23:HEL9 0.10540 0.10602
## INRA23:CSSM66 0.02845 0.02873
## INRA23:INRA32 0.16911 0.16933
## INRA23:ETH3 0.02650 0.02653
## INRA23:BM2113 -0.01882 -0.01905
## ETH10:HEL9 -0.10798 -0.10800
## ETH10:CSSM66 -0.04415 -0.04422
## ETH10:INRA32 0.15846 0.15997
## ETH10:ETH3 0.04830 0.04833
## ETH10:BM2113 0.12980 0.13012
## HEL9:CSSM66 0.01308 0.01308
## HEL9:INRA32 0.05656 0.05728
## HEL9:ETH3 -0.06605 -0.06616
## HEL9:BM2113 -0.01292 -0.01294
## CSSM66:INRA32 -0.09074 -0.09241
## CSSM66:ETH3 -0.03896 -0.03913
## CSSM66:BM2113 0.04368 0.04369
## INRA32:ETH3 0.03892 0.03911
## INRA32:BM2113 0.02722 0.02781
## ETH3:BM2113 -0.03823 -0.03846
##
## $Pop3
## Ia rbarD
## INRA63:INRA5 0.0037 0.0037
## INRA63:ETH225 0.1347 0.1348
## INRA63:HEL5 0.2255 0.2259
## INRA63:HEL1 0.0166 0.0171
## INRA63:INRA35 0.0615 0.0622
## INRA63:ETH152 0.2657 0.2724
## INRA63:INRA23 0.1710 0.1719
## INRA63:ETH10 -0.0612 -0.0615
## INRA63:HEL9 -0.0125 -0.0125
## INRA63:CSSM66 0.0146 0.0148
## INRA63:INRA32 -0.0805 -0.0969
## INRA63:ETH3 0.0361 0.0361
## INRA63:BM2113 -0.0574 -0.0578
## INRA5:ETH225 0.2735 0.2756
## INRA5:HEL5 -0.0665 -0.0666
## INRA5:HEL1 -0.0982 -0.1038
## INRA5:INRA35 0.1043 0.1045
## INRA5:ETH152 -0.0708 -0.0713
## INRA5:INRA23 -0.0878 -0.0894
## INRA5:ETH10 -0.0646 -0.0658
## INRA5:HEL9 0.0474 0.0478
## INRA5:CSSM66 -0.0957 -0.0958
## INRA5:INRA32 0.2183 0.2777
## INRA5:ETH3 -0.1280 -0.1290
## INRA5:BM2113 0.1855 0.1894
## ETH225:HEL5 0.0163 0.0164
## ETH225:HEL1 0.0760 0.0778
## ETH225:INRA35 0.1175 0.1195
## ETH225:ETH152 0.0268 0.0276
## ETH225:INRA23 0.0393 0.0394
## ETH225:ETH10 0.0393 0.0394
## ETH225:HEL9 -0.0476 -0.0476
## ETH225:CSSM66 0.1520 0.1539
## ETH225:INRA32 0.2930 0.3472
## ETH225:ETH3 0.2273 0.2273
## ETH225:BM2113 0.2332 0.2340
## HEL5:HEL1 -0.0349 -0.0365
## HEL5:INRA35 0.0163 0.0164
## HEL5:ETH152 0.0150 0.0153
## HEL5:INRA23 0.1525 0.1543
## HEL5:ETH10 -0.0901 -0.0912
## HEL5:HEL9 0.1342 0.1347
## HEL5:CSSM66 -0.0362 -0.0363
## HEL5:INRA32 -0.0336 -0.0417
## HEL5:ETH3 0.1113 0.1117
## HEL5:BM2113 0.0245 0.0249
## HEL1:INRA35 -0.0215 -0.0232
## HEL1:ETH152 0.0955 0.1061
## HEL1:INRA23 0.0918 0.0928
## HEL1:ETH10 0.1245 0.1258
## HEL1:HEL9 0.1442 0.1474
## HEL1:CSSM66 0.2192 0.2345
## HEL1:INRA32 0.1740 0.1870
## HEL1:ETH3 0.0534 0.0545
## HEL1:BM2113 0.1437 0.1449
## INRA35:ETH152 -0.0335 -0.0336
## INRA35:INRA23 0.0023 0.0023
## INRA35:ETH10 0.0884 0.0912
## INRA35:HEL9 0.3394 0.3454
## INRA35:CSSM66 -0.0186 -0.0186
## INRA35:INRA32 0.1562 0.2063
## INRA35:ETH3 0.0938 0.0954
## INRA35:BM2113 0.0754 0.0780
## ETH152:INRA23 0.1255 0.1321
## ETH152:ETH10 -0.0518 -0.0545
## ETH152:HEL9 0.0097 0.0100
## ETH152:CSSM66 0.0212 0.0212
## ETH152:INRA32 -0.0458 -0.0634
## ETH152:ETH3 0.0097 0.0100
## ETH152:BM2113 -0.0358 -0.0378
## INRA23:ETH10 -0.1717 -0.1717
## INRA23:HEL9 0.0010 0.0010
## INRA23:CSSM66 0.1798 0.1845
## INRA23:INRA32 -0.0057 -0.0065
## INRA23:ETH3 0.0812 0.0814
## INRA23:BM2113 -0.1196 -0.1196
## ETH10:HEL9 0.1079 0.1081
## ETH10:CSSM66 0.2020 0.2072
## ETH10:INRA32 -0.0905 -0.1035
## ETH10:ETH3 -0.0524 -0.0525
## ETH10:BM2113 0.0251 0.0251
## HEL9:CSSM66 0.0552 0.0559
## HEL9:INRA32 0.0205 0.0243
## HEL9:ETH3 0.1969 0.1969
## HEL9:BM2113 0.0415 0.0416
## CSSM66:INRA32 -0.0794 -0.1032
## CSSM66:ETH3 0.1606 0.1627
## CSSM66:BM2113 -0.0150 -0.0154
## INRA32:ETH3 0.1359 0.1607
## INRA32:BM2113 0.2219 0.2522
## ETH3:BM2113 -0.0126 -0.0127
##
## $Pop4
## Ia rbarD
## INRA63:INRA5 0.0189 0.0193
## INRA63:ETH225 0.2322 0.2355
## INRA63:HEL5 -0.0103 -0.0105
## INRA63:HEL1 -0.0974 -0.0976
## INRA63:INRA35 -0.0395 -0.0406
## INRA63:ETH152 0.2571 0.2644
## INRA63:INRA23 -0.0532 -0.0533
## INRA63:ETH10 -0.0815 -0.0819
## INRA63:HEL9 0.0645 0.0646
## INRA63:CSSM66 0.0268 0.0272
## INRA63:INRA32 0.0590 0.0590
## INRA63:ETH3 -0.1206 -0.1239
## INRA63:BM2113 0.0758 0.0775
## INRA5:ETH225 -0.0587 -0.0588
## INRA5:HEL5 0.1712 0.1713
## INRA5:HEL1 0.0672 0.0694
## INRA5:INRA35 -0.0071 -0.0071
## INRA5:ETH152 0.0622 0.0622
## INRA5:INRA23 0.0609 0.0616
## INRA5:ETH10 -0.0352 -0.0368
## INRA5:HEL9 -0.0529 -0.0534
## INRA5:CSSM66 0.0601 0.0602
## INRA5:INRA32 -0.0877 -0.0901
## INRA5:ETH3 -0.0537 -0.0537
## INRA5:BM2113 0.0067 0.0067
## ETH225:HEL5 -0.1089 -0.1089
## ETH225:HEL1 0.0468 0.0479
## ETH225:INRA35 -0.0465 -0.0466
## ETH225:ETH152 -0.0421 -0.0422
## ETH225:INRA23 0.1063 0.1070
## ETH225:ETH10 0.0405 0.0419
## ETH225:HEL9 -0.0128 -0.0129
## ETH225:CSSM66 -0.0169 -0.0169
## ETH225:INRA32 0.0823 0.0839
## ETH225:ETH3 0.1292 0.1295
## ETH225:BM2113 0.0455 0.0456
## HEL5:HEL1 0.1188 0.1217
## HEL5:INRA35 -0.1739 -0.1742
## HEL5:ETH152 0.0103 0.0103
## HEL5:INRA23 0.0965 0.0972
## HEL5:ETH10 -0.0408 -0.0422
## HEL5:HEL9 -0.0837 -0.0842
## HEL5:CSSM66 0.0769 0.0769
## HEL5:INRA32 0.0423 0.0431
## HEL5:ETH3 0.0135 0.0135
## HEL5:BM2113 -0.0752 -0.0753
## HEL1:INRA35 0.0277 0.0288
## HEL1:ETH152 -0.0211 -0.0220
## HEL1:INRA23 -0.0687 -0.0690
## HEL1:ETH10 0.0070 0.0070
## HEL1:HEL9 -0.0885 -0.0891
## HEL1:CSSM66 0.0443 0.0453
## HEL1:INRA32 0.1087 0.1087
## HEL1:ETH3 0.0382 0.0397
## HEL1:BM2113 -0.0175 -0.0181
## INRA35:ETH152 0.2385 0.2385
## INRA35:INRA23 -0.1849 -0.1879
## INRA35:ETH10 0.0847 0.0892
## INRA35:HEL9 0.0486 0.0493
## INRA35:CSSM66 -0.0013 -0.0013
## INRA35:INRA32 -0.1815 -0.1876
## INRA35:ETH3 -0.0759 -0.0759
## INRA35:BM2113 -0.0816 -0.0816
## ETH152:INRA23 -0.0750 -0.0763
## ETH152:ETH10 0.0019 0.0020
## ETH152:HEL9 0.0702 0.0713
## ETH152:CSSM66 -0.1483 -0.1487
## ETH152:INRA32 0.0497 0.0515
## ETH152:ETH3 -0.1083 -0.1083
## ETH152:BM2113 -0.0710 -0.0710
## INRA23:ETH10 -0.1717 -0.1735
## INRA23:HEL9 -0.1250 -0.1250
## INRA23:CSSM66 -0.1071 -0.1078
## INRA23:INRA32 0.1609 0.1614
## INRA23:ETH3 -0.0440 -0.0447
## INRA23:BM2113 0.0526 0.0533
## ETH10:HEL9 0.0892 0.0902
## ETH10:CSSM66 -0.0588 -0.0608
## ETH10:INRA32 -0.0321 -0.0322
## ETH10:ETH3 0.0824 0.0868
## ETH10:BM2113 0.0140 0.0146
## HEL9:CSSM66 0.0828 0.0833
## HEL9:INRA32 -0.0896 -0.0899
## HEL9:ETH3 -0.0259 -0.0263
## HEL9:BM2113 0.0618 0.0625
## CSSM66:INRA32 -0.0672 -0.0685
## CSSM66:ETH3 -0.0283 -0.0283
## CSSM66:BM2113 0.0338 0.0339
## INRA32:ETH3 -0.0743 -0.0768
## INRA32:BM2113 -0.0879 -0.0904
## ETH3:BM2113 -0.1078 -0.1078
##
## $Pop5
## Ia rbarD
## INRA63:INRA5 -0.1076 -0.1100
## INRA63:ETH225 0.1188 0.1203
## INRA63:HEL5 0.1641 0.1641
## INRA63:HEL1 0.0354 0.0363
## INRA63:INRA35 0.1662 0.1662
## INRA63:ETH152 0.0955 0.0955
## INRA63:INRA23 0.0663 0.0665
## INRA63:ETH10 -0.0278 -0.0280
## INRA63:HEL9 -0.0484 -0.0484
## INRA63:CSSM66 -0.1827 -0.1831
## INRA63:INRA32 -0.0815 -0.0819
## INRA63:ETH3 -0.0355 -0.0359
## INRA63:BM2113 -0.0633 -0.0633
## INRA5:ETH225 0.0697 0.0697
## INRA5:HEL5 -0.0767 -0.0790
## INRA5:HEL1 -0.1095 -0.1198
## INRA5:INRA35 -0.0068 -0.0070
## INRA5:ETH152 -0.0085 -0.0087
## INRA5:INRA23 0.1922 0.1941
## INRA5:ETH10 -0.0948 -0.0952
## INRA5:HEL9 0.3757 0.3829
## INRA5:CSSM66 -0.0656 -0.0662
## INRA5:INRA32 0.0078 0.0079
## INRA5:ETH3 -0.0051 -0.0051
## INRA5:BM2113 -0.0731 -0.0747
## ETH225:HEL5 0.1444 0.1472
## ETH225:HEL1 -0.1225 -0.1315
## ETH225:INRA35 0.0855 0.0863
## ETH225:ETH152 0.1508 0.1521
## ETH225:INRA23 0.2299 0.2309
## ETH225:ETH10 0.0943 0.0944
## ETH225:HEL9 0.0992 0.1003
## ETH225:CSSM66 0.0231 0.0232
## ETH225:INRA32 -0.0310 -0.0311
## ETH225:ETH3 -0.1444 -0.1444
## ETH225:BM2113 -0.0379 -0.0384
## HEL5:HEL1 -0.0095 -0.0097
## HEL5:INRA35 0.2069 0.2073
## HEL5:ETH152 0.0426 0.0426
## HEL5:INRA23 0.0382 0.0384
## HEL5:ETH10 -0.0398 -0.0403
## HEL5:HEL9 -0.0489 -0.0490
## HEL5:CSSM66 0.0299 0.0300
## HEL5:INRA32 -0.0417 -0.0420
## HEL5:ETH3 -0.1613 -0.1642
## HEL5:BM2113 -0.0199 -0.0199
## HEL1:INRA35 0.1201 0.1238
## HEL1:ETH152 -0.0667 -0.0688
## HEL1:INRA23 -0.0567 -0.0591
## HEL1:ETH10 0.0589 0.0625
## HEL1:HEL9 0.0155 0.0159
## HEL1:CSSM66 -0.0323 -0.0337
## HEL1:INRA32 0.0654 0.0687
## HEL1:ETH3 0.2538 0.2719
## HEL1:BM2113 0.2219 0.2276
## INRA35:ETH152 -0.0344 -0.0344
## INRA35:INRA23 -0.0560 -0.0561
## INRA35:ETH10 -0.0194 -0.0195
## INRA35:HEL9 0.3655 0.3655
## INRA35:CSSM66 -0.0138 -0.0138
## INRA35:INRA32 0.4635 0.4646
## INRA35:ETH3 0.0199 0.0201
## INRA35:BM2113 -0.0652 -0.0652
## ETH152:INRA23 0.0865 0.0866
## ETH152:ETH10 0.1600 0.1608
## ETH152:HEL9 0.0713 0.0713
## ETH152:CSSM66 -0.0297 -0.0298
## ETH152:INRA32 0.0365 0.0366
## ETH152:ETH3 -0.1134 -0.1143
## ETH152:BM2113 0.2145 0.2146
## INRA23:ETH10 0.2513 0.2517
## INRA23:HEL9 0.0974 0.0975
## INRA23:CSSM66 0.0625 0.0625
## INRA23:INRA32 0.0563 0.0564
## INRA23:ETH3 -0.0050 -0.0050
## INRA23:BM2113 -0.1738 -0.1742
## ETH10:HEL9 -0.0892 -0.0898
## ETH10:CSSM66 -0.0393 -0.0394
## ETH10:INRA32 0.2136 0.2137
## ETH10:ETH3 -0.0103 -0.0103
## ETH10:BM2113 0.0375 0.0378
## HEL9:CSSM66 0.0451 0.0452
## HEL9:INRA32 0.1628 0.1634
## HEL9:ETH3 -0.0437 -0.0442
## HEL9:BM2113 -0.0390 -0.0390
## CSSM66:INRA32 0.0835 0.0836
## CSSM66:ETH3 0.0661 0.0663
## CSSM66:BM2113 -0.0339 -0.0340
## INRA32:ETH3 0.1202 0.1204
## INRA32:BM2113 -0.0166 -0.0167
## ETH3:BM2113 0.1059 0.1070
##
## $Pop6
## Ia rbarD
## INRA63:INRA5 -2.5e-03 -2.5e-03
## INRA63:ETH225 -5.4e-02 -5.4e-02
## INRA63:HEL5 -2.4e-01 -2.4e-01
## INRA63:HEL1 -1.4e-01 -1.4e-01
## INRA63:INRA35 -9.3e-02 -9.6e-02
## INRA63:ETH152 -4.8e-02 -4.8e-02
## INRA63:INRA23 -1.1e-02 -1.1e-02
## INRA63:ETH10 8.5e-02 8.6e-02
## INRA63:HEL9 5.7e-02 5.8e-02
## INRA63:CSSM66 1.2e-01 1.2e-01
## INRA63:INRA32 1.5e-01 1.5e-01
## INRA63:ETH3 -1.0e-01 -1.1e-01
## INRA63:BM2113 -1.5e-01 -1.5e-01
## INRA5:ETH225 -2.4e-01 -2.4e-01
## INRA5:HEL5 -1.4e-01 -1.5e-01
## INRA5:HEL1 -2.1e-01 -2.1e-01
## INRA5:INRA35 2.0e-02 2.1e-02
## INRA5:ETH152 6.4e-02 6.4e-02
## INRA5:INRA23 1.1e-02 1.1e-02
## INRA5:ETH10 -5.6e-03 -5.6e-03
## INRA5:HEL9 -2.0e-01 -2.0e-01
## INRA5:CSSM66 3.1e-03 3.1e-03
## INRA5:INRA32 -7.0e-02 -7.0e-02
## INRA5:ETH3 -1.6e-01 -1.6e-01
## INRA5:BM2113 -9.9e-02 -9.9e-02
## ETH225:HEL5 8.5e-03 8.6e-03
## ETH225:HEL1 3.0e-01 3.0e-01
## ETH225:INRA35 1.2e-01 1.3e-01
## ETH225:ETH152 -1.7e-01 -1.7e-01
## ETH225:INRA23 -2.3e-02 -2.3e-02
## ETH225:ETH10 -5.2e-02 -5.2e-02
## ETH225:HEL9 -2.2e-02 -2.3e-02
## ETH225:CSSM66 -2.6e-01 -2.6e-01
## ETH225:INRA32 -2.4e-02 -2.4e-02
## ETH225:ETH3 2.3e-01 2.4e-01
## ETH225:BM2113 3.6e-02 3.6e-02
## HEL5:HEL1 1.7e-01 1.8e-01
## HEL5:INRA35 -1.5e-01 -1.7e-01
## HEL5:ETH152 2.8e-02 2.9e-02
## HEL5:INRA23 1.1e-01 1.1e-01
## HEL5:ETH10 1.4e-01 1.4e-01
## HEL5:HEL9 2.2e-01 2.4e-01
## HEL5:CSSM66 5.0e-02 5.0e-02
## HEL5:INRA32 -2.2e-02 -2.2e-02
## HEL5:ETH3 1.7e-02 1.7e-02
## HEL5:BM2113 -8.0e-02 -8.1e-02
## HEL1:INRA35 -2.1e-01 -2.1e-01
## HEL1:ETH152 1.1e-01 1.1e-01
## HEL1:INRA23 -3.3e-16 -3.9e-16
## HEL1:ETH10 1.1e-01 1.2e-01
## HEL1:HEL9 5.5e-02 5.5e-02
## HEL1:CSSM66 -5.2e-02 -5.5e-02
## HEL1:INRA32 -1.6e-02 -1.6e-02
## HEL1:ETH3 4.6e-01 4.9e-01
## HEL1:BM2113 -2.9e-02 -2.9e-02
## INRA35:ETH152 -2.3e-01 -2.3e-01
## INRA35:INRA23 1.1e-01 1.2e-01
## INRA35:ETH10 -2.3e-01 -2.4e-01
## INRA35:HEL9 -2.0e-01 -2.0e-01
## INRA35:CSSM66 -9.2e-02 -9.9e-02
## INRA35:INRA32 -1.8e-01 -2.0e-01
## INRA35:ETH3 -1.5e-01 -1.7e-01
## INRA35:BM2113 -6.7e-02 -7.0e-02
## ETH152:INRA23 -8.2e-02 -8.3e-02
## ETH152:ETH10 1.3e-01 1.3e-01
## ETH152:HEL9 -1.1e-01 -1.1e-01
## ETH152:CSSM66 1.8e-01 1.8e-01
## ETH152:INRA32 -1.2e-01 -1.2e-01
## ETH152:ETH3 1.1e-01 1.2e-01
## ETH152:BM2113 1.6e-01 1.6e-01
## INRA23:ETH10 -1.6e-01 -1.6e-01
## INRA23:HEL9 -1.2e-01 -1.2e-01
## INRA23:CSSM66 -5.1e-02 -5.2e-02
## INRA23:INRA32 2.5e-02 2.5e-02
## INRA23:ETH3 2.8e-02 2.8e-02
## INRA23:BM2113 -1.3e-01 -1.3e-01
## ETH10:HEL9 -2.7e-01 -2.8e-01
## ETH10:CSSM66 4.7e-01 4.8e-01
## ETH10:INRA32 -5.2e-02 -5.2e-02
## ETH10:ETH3 1.7e-01 1.8e-01
## ETH10:BM2113 -1.0e-01 -1.0e-01
## HEL9:CSSM66 -7.1e-02 -7.6e-02
## HEL9:INRA32 4.4e-01 4.7e-01
## HEL9:ETH3 -1.6e-01 -1.7e-01
## HEL9:BM2113 -1.3e-01 -1.3e-01
## CSSM66:INRA32 -1.2e-01 -1.2e-01
## CSSM66:ETH3 -2.2e-02 -2.3e-02
## CSSM66:BM2113 -7.5e-02 -7.6e-02
## INRA32:ETH3 -6.4e-02 -6.5e-02
## INRA32:BM2113 -9.2e-03 -9.3e-03
## ETH3:BM2113 -1.9e-02 -2.0e-02
There are some typical descriptive statistics you will find in most population genetics papers. These summary statistics give us an overview of some important features of populations under study.
We will focus on the basic ones describing variation within populations and describing variation between populations.
Note: many of these can be calculated over all populations as well as per population or between pairs of populations. This offers information at different scales.
If we are interested in genetic variation in natural populations we often consider heterozygosity.
High heterozygosity means a lot of genetic variability.
Low heterozygosity means little genetic variability.
Let’s consider heterozygosity in a simple system with two alleles at a locus: A and a. Let’s also assume that this population is in HWE.
Then:
\(p = f(A)\)
\(q = f(a)\)
So under HWE, we obtain the following mating table:
A a A p^2 pq a qp q^2
So \(pq + qp = 2pq\) gives the frequency of heterozygote genotypes.
In this two-allele system, heterozygosity is highest at \(p = 0.5\). Let’s visualize:
freqP <- seq(from = 0, to = 1, by = 0.01)
freqQ <- 1 - freqP
plot(2*freqP*freqQ ~ freqQ, type = "l",
ylab = "frequency of heterozygote genotypes",
xlab = "frequency of allele 'a'")
text(0, 0, "AA")
text(1, 0, "aa")
arrows(0.45, 0, 0.05, 0)
text(0.5, 0, "Aa")
arrows(0.55, 0, 0.95, 0)
As you can imagine, this becomes more complex when there are more alleles per locus!
Questions?
Now we will calculate the observed heterozygosity in our populations, as well as the expected heterozygosity if the populations are in HWE.
For this we use the summary() function from adegenet. We can extract the observed heterozygosity using $Hobs and the expected heterozygosity using $Hexp.
To make life convenient, you can extract the values for every locus, place them in a data.frame and compute the difference:
heteroz <- data.frame(Hexp = summary(myData)$Hexp, Hobs = summary(myData)$Hobs)
heteroz$diff <- heteroz$Hexp - heteroz$Hobs
heteroz
We can also visualize how observed heterozygosity is different from expected heterozygosity by subtracting one from the other.
If you remember from before, we saw that loci were not in HWE over all populations.
barplot(heteroz$diff, names.arg = rownames(heteroz),
main = "Heterozygosity: expected-observed",
xlab = "", ylab = "Hexp - Hobs", font.lab = 2, las = 2)
Or we can use another representation, using ggplot2 for a change:
heteroz$loci <- rownames(heteroz) ## ggplot needs names stored as a column
ggplot(heteroz, aes(y = Hexp, x = Hobs)) +
geom_segment(aes(y = Hexp - 0.01, yend = Hobs, xend = Hobs), linetype = "dashed") +
geom_text(aes(label = loci), size = 3) +
geom_abline(slope = 1) +
scale_x_continuous(limits = range(c(heteroz$Hobs, heteroz$Hexp))) +
scale_y_continuous(limits = range(c(heteroz$Hobs, heteroz$Hexp))) +
labs(title = "Heterozygosity: expected vs observed") +
xlab(expression(bold("Observed heterozygosity"))) +
ylab(expression(bold("Expected heterozygosity"))) +
theme_classic() +
coord_fixed()
Now let’s consider observed and expected heterozygosity by population. Here, we will focus on the mean across loci.
adegenet conveniently provides us with a function to calculate this for expected heterozygosity with Hs().
Hs(myData)
## Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## 0.7038393 0.7051239 0.5183017 0.5654010 0.6011010 0.5996094
To obtain observed heterozygosity there is no short cut (yet), so we need to use lapply() and seppop() again. Here we will calculate the mean of $Hobs per population in myData. As the output of seppop is a list, we also use unlist() to get a simple vector.
Hobs <- lapply(seppop(myData), function(x) mean(summary(x)$Hobs))
Hobs <- unlist(Hobs)
Hobs
## Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## 0.6964286 0.6751583 0.5105442 0.5054187 0.5796228 0.5982143
Again, we can store the results for observed and expected heterozygosity into a data.frame and compute the difference:
heteroz_per_pop <- data.frame(Hexp = Hs(myData), Hobs = Hobs)
heteroz_per_pop$diff <- heteroz_per_pop$Hexp - heteroz_per_pop$Hobs
heteroz_per_pop
We have already tested if these are significantly different above (while testing for the HWE).
Geeky note:
To look at both the effect of the locus and the population at once, you can write some custom code:
heteroz_matrix <- do.call("cbind", lapply(seppop(myData),
function(x) summary(x)$Hexp - summary(x)$Hobs))
heteroz_matrix
## Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## INRA63 -0.00625 0.175555556 0.055555556 0.10000000 0.09814050 0.029296875
## INRA5 0.18500 -0.006666667 0.157777778 -0.04388889 0.16632231 -0.005859375
## ETH225 -0.08875 0.128888889 -0.180000000 -0.01111111 0.09637188 -0.078125000
## HEL5 0.14750 0.293697979 0.080000000 0.05111111 -0.03822314 -0.097656250
## HEL1 -0.08375 -0.031111111 -0.055555556 0.18668252 -0.06301653 0.015625000
## INRA35 -0.04000 0.132777778 0.191111111 0.05053508 0.06508264 0.134765625
## ETH152 0.07250 -0.002222222 0.221938776 -0.09833333 -0.01859504 0.121093750
## INRA23 0.07375 -0.058673469 -0.068888889 -0.02913199 0.12603306 0.214843750
## ETH10 -0.08875 -0.097777778 -0.068888889 0.24333333 0.13326446 -0.078125000
## HEL9 -0.05500 0.033888889 -0.006666667 0.22833333 -0.07541322 0.066406250
## CSSM66 -0.06375 -0.042777778 0.124444444 0.10555556 -0.03719008 -0.001953125
## INRA32 0.17750 0.026159334 -0.002222222 0.10611111 0.07954545 -0.169921875
## ETH3 -0.05625 -0.028333333 -0.197777778 -0.07277778 -0.08390023 -0.109375000
## BM2113 -0.07000 -0.103888889 -0.142222222 0.02333333 -0.14772727 -0.021484375
This can be easily plotted with lattice:
levelplot(heteroz_matrix, col.regions = viridis(100),
main = "Heterozygosity: expected-observed",
xlab = "Locus", ylab = "Population")
An effect of population subdivision on genetic variation is the reduction in observed heterozygotes. As we have seen previously (Wahlund effect).
The extent of reduction in observed heterozygotes can be used to quantify the level of differentiation between sub-populations.
This quantification is formalised in a series of hierarchical F-statistics.
–> We are using a measure of departure from HWE to quantify the extent of differentiation between populations.
F-statistics also provide a way to quantify the level of heterozygosity at the individual level, and if/how this departs from expectations of HWE in the population.
How are these linked?
F-statistics, from https://en.wikipedia.org/wiki/F-statistics
We will focus on the two most common F-statistics: \(F_{IS}\) and \(F_{ST}\).
\(F_{IS}\) is known as the inbreeding coefficient.
\(F_{ST}\) is known as the fixation index.
Let’s calculate \(F_{ST}\) for our previous example.
AA Aa aa Pop1 50 0 0 Pop2 0 0 50
Here, \(p = 0.5\) and \(q = 0.5\).
Our expected heterozygosity for the total population (i.e., Pop1 and Pop2 together) is given by
\(H_T = 2pq = 2 \times 0.5 \times 0.5 = 0.5\)
As we have no variation in the sub-populations (i.e., Pop1 or Pop2) \(H_S = 0\) because \(2pq = 2 \times 1 \times 0\)
Then
\(F_{ST} = (0.5 - 0) / 0.5 = 1.0\)
Note: this has been a very simple exploration of how F-statistics are calculated. This has been expanded upon, and not all software or R packages calculate F-stats in the same way. For example, some account for factors such as how individuals disperse (island model vs stepping-stone model), the mutation process (infinite alleles model vs step-wise mutation model) and other bias (e.g. taking into account sampling bias). It is thus important to report (and if possible understand) which method you use:
To cite a package:
citation(package = "pegas")
##
## To cite pegas in a publication use:
##
## Paradis E. 2010. pegas: an R package for population genetics with an integrated-modular approach. Bioinformatics 26: 419-420.
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {pegas: an {R} package for population genetics with an integrated--modular approach},
## author = {E. Paradis},
## journal = {Bioinformatics},
## year = {2010},
## volume = {26},
## pages = {419-420},
## }
##
## As pegas is evolving quickly, you may want to cite also its version number (found with 'library(help = pegas)').
packageVersion("pegas") ## don't forget to cite the version number!
## [1] '0.14'
Now let’s get back to myData.
We can use the Fst() function of the pegas package to calculate F-stats by locus, over all populations. But it requires the loci format, so we need to use as.loci() to transform our data.
Fst(as.loci(myData))
## Fit Fst Fis
## INRA63 0.33802078 0.20747328 0.16472316
## INRA5 0.18137553 0.04798419 0.14011462
## ETH225 0.11686741 0.09742513 0.02154091
## HEL5 0.21269341 0.07811146 0.14598506
## HEL1 0.31398545 0.28907109 0.03504480
## INRA35 0.29918890 0.07347563 0.24361288
## ETH152 0.21049163 0.15643982 0.06407582
## INRA23 0.17236638 0.10827954 0.07186876
## ETH10 0.21569817 0.16348381 0.06241883
## HEL9 0.24349243 0.16970350 0.08887058
## CSSM66 0.09195957 0.05406492 0.04006051
## INRA32 0.26891906 0.17504229 0.11379585
## ETH3 0.04997358 0.13763044 -0.10164651
## BM2113 0.06567225 0.12927368 -0.07304411
How would you interpret these values?
We can also build a data.frame with all the information we need:
data.frame(Hobs = summary(myData)$Hobs, Hexp = summary(myData)$Hexp, Fst(as.loci(myData))[, c("Fst", "Fis")])
Often we are also interested in specific \(F_{ST}\) values between pairs of populations. Genetic differentiation among all populations does not need to be equal.
Let’s look at \(F_{ST}\) between pops 1 and 2. We will extract the first two populations from myData and turn this into loci format using as.loci().
myData_pop12 <- as.loci(myData[pop = c("Pop1", "Pop2")])
Now we estimate \(F_{ST}\) over all loci, obtained by using mean(), and constrain ourselves to the column of the output with the \(F_{ST}\) values:
mean(Fst(myData_pop12)[, "Fst"])
## [1] 0.04939851
Now we have a measure of differentiation between Pop1 and Pop2.
This pairwise \(F_{ST}\) value is the mean of \(F_{ST}\) values of all loci. We can check \(F_{ST}\) for every locus using Fst(myData_pop12)[, "Fst"].
As any estimate, the \(F_{ST}\) is measured with some uncertainty. To represent this uncertainty, we can compute the 95\(\%\) confidence interval of the mean \(F_{ST}\) value using a “very simple” bootstrap (more advanced implementations are possible):
Fst_boot <- replicate(100, {
myData_pop12 <- myData_pop12[, c(1, sample(attr(myData_pop12, "loci"), replace = TRUE)[-1])]
mean(Fst(myData_pop12)[, "Fst"])
})
quantile(Fst_boot, c(0.025, 0.975))
## 2.5% 97.5%
## 0.02409615 0.07172776
Let’s plot the distribution of \(F_{ST}\) we just obtained:
hist(Fst_boot, xlim = c(0, 0.2))
abline(v = 0, col = 2, lwd = 2, lty = 2)
Geeky note:
For now, there is no simple solution to compute pairwise \(F_{ST}\) in R, but here is a solution using a home made function we prepared for you:
pairwise_F(myData, confint = FALSE) ## without Confidence Interval
## Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## Pop1 NA 0.04939851 0.14266348 0.10909326 0.07058474 0.1335223
## Pop2 0.04939851 NA 0.18210328 0.15354673 0.13306532 0.1896989
## Pop3 0.14266348 0.18210328 NA 0.09044349 0.06150365 0.2385422
## Pop4 0.10909326 0.15354673 0.09044349 NA 0.03950569 0.2062700
## Pop5 0.07058474 0.13306532 0.06150365 0.03950569 NA 0.1627551
## Pop6 0.13352231 0.18969892 0.23854224 0.20626996 0.16275506 NA
Let’s do the same thing while estimating the amount of uncertainty:
pairwise_Fst <- pairwise_F(myData, confint = TRUE) ## with CI
lapply(pairwise_Fst, round, digit = 2) ## output after rounding with 2 digits
## $mean
## Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## Pop1 NA 0.05 0.14 0.11 0.07 0.13
## Pop2 0.05 NA 0.18 0.15 0.13 0.19
## Pop3 0.14 0.18 NA 0.09 0.06 0.24
## Pop4 0.11 0.15 0.09 NA 0.04 0.21
## Pop5 0.07 0.13 0.06 0.04 NA 0.16
## Pop6 0.13 0.19 0.24 0.21 0.16 NA
##
## $upr
## Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## Pop1 NA 0.07 0.17 0.14 0.10 0.19
## Pop2 0.07 NA 0.24 0.18 0.17 0.24
## Pop3 0.17 0.24 NA 0.11 0.08 0.30
## Pop4 0.14 0.18 0.11 NA 0.06 0.26
## Pop5 0.10 0.17 0.08 0.06 NA 0.22
## Pop6 0.19 0.24 0.30 0.26 0.22 NA
##
## $lwr
## Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## Pop1 NA 0.03 0.10 0.08 0.05 0.08
## Pop2 0.03 NA 0.12 0.13 0.10 0.15
## Pop3 0.10 0.12 NA 0.07 0.04 0.17
## Pop4 0.08 0.13 0.07 NA 0.02 0.14
## Pop5 0.05 0.10 0.04 0.02 NA 0.09
## Pop6 0.08 0.15 0.17 0.14 0.09 NA
We can plot this output:
levelplot(pairwise_Fst$mean, col.regions = rev(grey.colors(30)))
It is worth checking if your populations have alleles that cannot be found in other populations. These are called “private alleles.”
We can use the private_alleles() function of poppr to get information about which alleles can only be found in a population.
Alleles are labelled as locus.allele.
private_alleles(myData)
## INRA63.171 INRA5.149 ETH225.141 ETH225.137 HEL5.149 HEL5.159 INRA35.108 INRA35.106 ETH152.205 INRA23.203 INRA23.217 INRA23.193 ETH10.223 ETH10.213
## Pop1 0 0 1 0 2 0 0 0 0 0 0 0 0 0
## Pop2 1 1 0 0 0 0 3 0 0 2 0 0 0 0
## Pop3 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## Pop4 0 0 0 1 0 0 0 1 0 0 0 0 1 0
## Pop5 0 0 0 0 0 0 0 0 0 0 0 1 0 0
## Pop6 0 0 0 0 0 1 0 0 1 0 0 0 0 1
## HEL9.149 HEL9.151 CSSM66.171 CSSM66.191 INRA32.202 INRA32.164 INRA32.160 INRA32.168 ETH3.103 ETH3.113 BM2113.124 BM2113.150 BM2113.128 BM2113.132
## Pop1 0 0 0 0 1 1 0 0 1 0 0 0 0 0
## Pop2 1 0 0 0 0 0 1 0 0 0 0 0 0 0
## Pop3 0 0 0 0 0 0 0 0 0 0 9 0 0 0
## Pop4 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Pop5 0 1 1 0 0 0 0 0 0 1 0 1 0 0
## Pop6 0 0 0 1 0 0 0 1 0 0 0 0 5 12
## BM2113.126 BM2113.144
## Pop1 0 0
## Pop2 0 0
## Pop3 0 0
## Pop4 0 0
## Pop5 0 0
## Pop6 2 2
We can summarise the information for each population:
private_alleles_per_pop <- apply(private_alleles(myData), 1, sum)
private_alleles_per_pop
## Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## 6 9 10 3 5 26
Private alleles will impact measures of differentiation between populations. This is especially true, if their frequency is high (i.e., if they are common).
Note that the number of private alleles may be influence by many factors, including sample size.
What would you expect the relationship to be?
Let’s explore this using a plot:
ind_per_pop <- sapply(seppop(myData), nInd)
plot(ind_per_pop, private_alleles_per_pop,
xlab = "Sample size", ylab = "Number of private alleles", las = 1,
col = NULL)
text(y = private_alleles_per_pop,
x = ind_per_pop,
labels = names(ind_per_pop))
Individual-based analyses frequently do not make assumptions about HWE, because they look at the differences between individuals.
But consider how this kind of information must be presented.
individual x individual
It is not surprising that we have come up with ways to visualize this kind of information.
Two very common ways are trees (or networks) and PCA.
One benefit of displaying information in this manner is that we can see if there are groupings of individuals that are more similar to eachother than they are to other individuals.
adegenet and poppr offer us the ability to generate other distance measures. Most of the ones offered in poppr work for individuals (rather than populations).
Note: several of these distance measures make strong assumptions about the biological nature of our genetic samples.
Let’s try another measure that makes no assumptions: Prevosti’s distance, which we can use with prevosti.dist()
mynj_prevosti <- nj(prevosti.dist(seppop(myData)[["Pop1"]]))
plot(mynj_prevosti, type = "unrooted")
Note: we did not need to use 1 - in this case, because the function prevosti.dist() directly provide a distance.
Let us compare the two trees we just made to see if they match:
cophyloplot(mynj, mynj_prevosti,
assoc = cbind(mynj$tip.label, mynj_prevosti$tip.label))
Are you happy with that?
Now that we know how to make a NJ tree for a small dataset, we can do it for all samples in myData.
bignj <- nj(prevosti.dist(myData))
plot(bignj, type = "unrooted")
Geeky note: we can improve this plot by plotting colours and changing the type of tree. Here a possibility would be a “fan” plot:
plot(bignj, type = "fan", show.tip.label = FALSE, x.lim = c(-0.7, 0.7), no.margin = TRUE)
tiplabels(text = rownames(myData@tab),
frame = "none",
col = rainbow(nPop(myData))[as.numeric(myData@pop)], cex = 0.8, offset = 0.05)
legend("topleft", fill = rainbow(nPop(myData)),
legend = popNames(myData), bty = "n",
title = "Population")
Or perhaps a “radial” plot:
plot(bignj, type = "radial", show.tip.label = FALSE, x.lim = c(-0.7, 0.7), no.margin = TRUE)
tiplabels(text = rownames(myData@tab),
frame = "none",
col = rainbow(nPop(myData))[as.numeric(myData@pop)], cex = 0.8, offset = 0.05)
legend("topleft", fill = rainbow(nPop(myData)),
legend = popNames(myData), bty = "n",
title = "Population")
An alternative to neighbour joining is to reduce the dimensionality of the problem so that it can be plotted in 2 dimensions instead of one dimension per locus. Many options exist but the most common one is the so-called Principal Component Analysis, or PCA for short.
This is how you run a PCA:
myData_matrix <- scaleGen(myData, center = FALSE, scale = FALSE, NA.method = "mean")
mypca <- dudi.pca(myData_matrix, center = TRUE, scale = FALSE, scannf = FALSE, nf = Inf)
The PCA creates new dimensions…
head(mypca$li[, 1:4]) ## only show head for first 4 axes
which are uncorrelated…
head(zapsmall(cor(mypca$li))[, 1:4]) ## only show head for first 4 axes
## Axis1 Axis2 Axis3 Axis4
## Axis1 1 0 0 0
## Axis2 0 1 0 0
## Axis3 0 0 1 0
## Axis4 0 0 0 1
## Axis5 0 0 0 0
## Axis6 0 0 0 0
and which capture a decreasing amount of variation of the original loci:
barplot(mypca$eig,
names.arg = colnames(mypca$li),
cex.names = 0.5,
col = heat.colors(length(mypca$eig)),
las = 2, ylab = "Inertia")
or in cumulated percentage:
barplot(cumsum(100*mypca$eig/sum(mypca$eig)),
names.arg = colnames(mypca$li),
cex.names = 0.5,
col = rev(heat.colors(length(mypca$eig))),
las = 2, log = "y",
ylab = "Cumulative proportion of variance explained")
There are many ways to plot a PCA, but here we are interested in projecting the individuals into the new loci space, so we use the function s.class():
s.class(mypca$li, fac = pop(myData),
col = rainbow(nPop(myData)), grid = FALSE, xax = 1, yax = 2, cpoint = 0)
#s.label(mypca$li, add.plot = TRUE, boxes = FALSE, clabel = 0.5)
add.scatter.eig(mypca$eig[1:10], xax = 1, yax = 2, ratio = 0.15)
The PCA does not force the group to be different, it just shows the overall structure. In contrast, the DAPC is a method that allows to explore whether some combinations of alleles would allows the characterisation of distinct groups.
Here is an example:
myclusters <- find.clusters(myData, n.clust = 3, n.pca = Inf) ## better not fixing n.clust and selecting it interactivelly!!!
dapc1 <- dapc(myData, pop = myclusters$grp, n.pca = Inf, n.da = Inf)
scatter(dapc1)
So yes, it seems that it is possible to create groups… but it is not immediatly clear what these groups correspond to in this dataset.
So let’s display the original populations on top of the plot:
s.class(dapc1$ind.coord, fac = pop(myData),
col = rainbow(nPop(myData)), grid = FALSE, xax = 1, yax = 2, cpoint = 1)
Another type of visual representation is to mimic that one produced by the famous program Structure:
compoplot(dapc1, show.lab = TRUE, legend = FALSE, cex.names = 0.4,
lab = paste(pop(myData), rownames(dapc1$tab)))